Hi,
I'm wondering what's so special about 200 to have it the default value
of spark.shuffle.sort.bypassMergeThreshold?
Is this arbitrary number? Is there any theory behind it?
Is the number of partitions in Spark SQL, i.e. 200, somehow related to
spark.shuffle.sort.bypassMergeThreshold?
scala> spark.range(5).groupByKey(_ % 5).count.rdd.getNumPartitions
res3: Int = 200
I'd appreciate any guidance to get the gist of this seemingly magic
number. Thanks!
Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/Mastering Apache Spark 2.0
https://bit.ly/mastering-apache-sparkFollow me at
https://twitter.com/jaceklaskowski---------------------------------------------------------------------
To unsubscribe e-mail:
[hidden email]