Some question for range

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
Report Content as Inappropriate

Some question for range

raintung li
Hi All,

The code: RangePartitioner 

 // This is the sample size we need to have roughly balanced output partitions, capped at 1M.

      val sampleSize = math.min(20.0 * partitions, 1e6)

      // Assume the input partitions are roughly balanced and over-sample a little bit.

      val sampleSizePerPartition = math.ceil(3.0 * sampleSize / rdd.partitions.length).toInt

The Constants : 20.0 and 3.0 It is hardcode. Why is it fixed? 

Is it come from some white paper or research?


-Raintung Li