[Standalone Spark] Master Configuration Push-Down

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[Standalone Spark] Master Configuration Push-Down

Sean Po
I am running Spark Standalone mode and I am finding that when I configure ports (i.e. spark.blockManager.port) in both the Spark Master's spark-defaults.conf as well as the Spark Worker's, that the Spark Master's port is the one that will be used in all the workers. Judging by the code, this seems to be done by design. If executor sizes are small, then the 16 ports attempted will be exhausted, and executors will fail to start. This is further exacerbated by the fact that multiple Spark Workers can exist on the same machine in my particular circumstance.

What are the community's thoughts on changing this behavior such that 
  1. The port push down will only happen if the Spark Worker's port configuration is not set. This won't solve the problem, but will mitigate it and seems to make sense from a user experience point of view. 
Similarly, I'd like to prevent environment variable push down as well. Perhaps instead of 1. if we can have a configurable switch to turn off push down of port configuration and a different one to turn off environment variable push down, this will work too.

Please share some of your thoughts 😊