Hi All,

The code: RangePartitioner

// This is the sample size we need to have roughly balanced output partitions, capped at 1M.

val sampleSize = math.min(20.0 * partitions, 1e6)

// Assume the input partitions are roughly balanced and over-sample a little bit.

val sampleSizePerPartition = math.ceil(3.0 * sampleSize / rdd.partitions.length).toInt

The Constants : 20.0 and 3.0 It is hardcode. Why is it fixed?

Is it come from some white paper or research?

Regards

-Raintung Li