[spark][core] SPARK-21097 Dynamic Allocation Pull Request

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[spark][core] SPARK-21097 Dynamic Allocation Pull Request

Bradley Kaiser
Hi all,

I've written a new Spark feature and I would love to have a committer take a look at it. I want to increase Spark performance when using dynamic allocation by preserving cached data.

The PR and Jira ticket are here:

https://github.com/apache/spark/pull/19041
https://issues.apache.org/jira/browse/SPARK-21097

Notebook spark users are the primary target for this change. Notebook users generally have periods of inactivity where spark executors could be used for other jobs, but if the user has any cached data, then they will either lock up those executors or lose their cached data. This change remedies this problem by replicating data to surviving executors before shutting down idle ones.

I have conducted some benchmarks showing significant performance gains under the right usage patterns. See the benchmark data here:

https://docs.google.com/document/d/1E6_rhAAJB8Ww0n52-LYcFTO1zhJBWgfIXzNjLi29730/edit?usp=sharing

I tried to mitigate the risk of this code change by keeping the code self contained and falling back to regular dynamic allocation behavior if there are any issues. The feature should work with any coarse grained backend and I have tested with YARN and standalone clusters.

I would love to discuss this change with anyone who is interested. Your attention is greatly appreciated.

Thanks
Brad Kaiser


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]