Manual allocation of Spark executors during Spark application runtime

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Manual allocation of Spark executors during Spark application runtime

Steven Stetzler
Hi all,

I am wondering if there is a method to manually tune the number of Spark executors when dynamic allocation is enabled on a Spark application. Say for example I have I PySpark shell application running on Kubernetes:
```
$ python
> from pyspark.sql import SparkSession
> spark = (
    SparkSession
    .builder
    .config("spark.master", "k8s://<k8s-endpoint>")
    .config("spark.dynamicAllocation.enabled", "true")
    .config("spark.dynamicAllocation.minExecutors", "1")
    .config("spark.dynamicAllocation.maxExecutors", "4")
    .config("spark.dynamicAllocation.shuffleTracking.enabled", "true")
    .enableHiveSupport()
    .getOrCreate()
)
```

I am wondering if there is a way to do something like
```
spark.scaleExecutors(3)
```
to set the number of executors to 3. And if not through the Python API, then through the SparkSession/SparkContext object in the JVM. (This API call is made up, I am wondering about possible functionality.)

It seems to me that for dynamic allocation to work, there must be some internal API being called that sets the number of executors, scaling it up and down as required. I am not sure where/if this internal API exists (I've had some trouble going through the source code; I am not too familiar with Scala), but I am wondering if there is a way to make this internal API external to the user.

The Kubernetes cluster I am using is an elastic resource, so my motivation is to utilize this elasticity during application runtime. If a query in my Spark application is running slowly, I'd like to add more executors to it without restarting my query or recreating my SparkSession/SparkContext.

Thanks,
Steven