What does spark.python.worker.memory affect?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
Report Content as Inappropriate

What does spark.python.worker.memory affect?

Cyanny LIANG
As the documentation said:
Amount of memory to use per python worker process during aggregation, in the same format as JVM memory strings (e.g. 512m2g). If the memory used during aggregation goes above this amount, it will spill the data into disks.

I search the config in spark source code, only rdd.py use the option.
It means that the option only work in python rdd.groupByKey or
rdd.sortByKey etc.
The python ExternalSorter or ExternalMerger will spill data to disk when memory reach the  spark.python.worker.memory limit.

When PythonRunner fork a python worker subprocess, what is the memory limit for each python worker? does spark.python.worker.memory affect the memory of a python worker?

Best & Regards
Cyanny LIANG