Providing a namespace for third-party configurations

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Providing a namespace for third-party configurations

Nicholas Chammas
I discovered today that EMR provides its own optimizations for Spark. Some of these optimizations are controlled by configuration settings with names like `spark.sql.dynamicPartitionPruning.enabled` or `spark.sql.optimizer.flattenScalarSubqueriesWithAggregates.enabled`. As far as I can tell, these are EMR-specific configurations.

Does this create a potential problem, since it's possible that future Apache Spark configuration settings may end up colliding with these names selected by EMR?

Should we document some sort of third-party configuration namespace pattern and encourage third parties to scope their custom configurations to that area? e.g. Something like `spark.external.[vendor].[whatever]`.

Nick
Reply | Threaded
Open this post in threaded view
|

Re: Providing a namespace for third-party configurations

Sean Owen-2
It's possible, but pretty unlikely to have an exact namespace
collision. It's probably a best practice to clearly separates
settings, etc that are downstream add-ons into a separate namespace,
and I don't mind a sentence in a doc somewhere suggesting a
convention, but I really think it's up to downstream projects to
decide how they want to play it. If Spark did add a feature with this
exact key, that'd be up to downstream projects to rationalize.

On Fri, Aug 30, 2019 at 9:36 AM Nicholas Chammas
<[hidden email]> wrote:
>
> I discovered today that EMR provides its own optimizations for Spark. Some of these optimizations are controlled by configuration settings with names like `spark.sql.dynamicPartitionPruning.enabled` or `spark.sql.optimizer.flattenScalarSubqueriesWithAggregates.enabled`. As far as I can tell, these are EMR-specific configurations.
>
> Does this create a potential problem, since it's possible that future Apache Spark configuration settings may end up colliding with these names selected by EMR?
>
> Should we document some sort of third-party configuration namespace pattern and encourage third parties to scope their custom configurations to that area? e.g. Something like `spark.external.[vendor].[whatever]`.
>
> Nick

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Providing a namespace for third-party configurations

Jungtaek Lim
My 2 cents, I second with Sean, as it's not necessary for downstream projects to have a rule for config name starting with "spark".

I guess I know why they want it, but then it should be clear what's the goal. To differentiate with existing spark configs? Using another prefix would help. To smoothly integrated with existing Spark configurations but avoid the chance on collision? That makes sense, but I'd like to see upstream projects supporting such case as I haven't seen such request .


On Sat, Aug 31, 2019 at 12:25 AM Sean Owen <[hidden email]> wrote:
It's possible, but pretty unlikely to have an exact namespace
collision. It's probably a best practice to clearly separates
settings, etc that are downstream add-ons into a separate namespace,
and I don't mind a sentence in a doc somewhere suggesting a
convention, but I really think it's up to downstream projects to
decide how they want to play it. If Spark did add a feature with this
exact key, that'd be up to downstream projects to rationalize.

On Fri, Aug 30, 2019 at 9:36 AM Nicholas Chammas
<[hidden email]> wrote:
>
> I discovered today that EMR provides its own optimizations for Spark. Some of these optimizations are controlled by configuration settings with names like `spark.sql.dynamicPartitionPruning.enabled` or `spark.sql.optimizer.flattenScalarSubqueriesWithAggregates.enabled`. As far as I can tell, these are EMR-specific configurations.
>
> Does this create a potential problem, since it's possible that future Apache Spark configuration settings may end up colliding with these names selected by EMR?
>
> Should we document some sort of third-party configuration namespace pattern and encourage third parties to scope their custom configurations to that area? e.g. Something like `spark.external.[vendor].[whatever]`.
>
> Nick

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



--