Clarification on Spark code comments

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Clarification on Spark code comments

Neerav Kumar
Hi

I am new to the community so pardon me if my question is framed incorrectly. I was going through the Spark code base on GitHub and am confused with comment mentioned. In file https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/util/PeriodicRDDCheckpointer.scala
I see the comment says
Example usage:
* {{{
* val (rdd1, rdd2, rdd3, ...) = ...
* val cp = new PeriodicRDDCheckpointer(2, sc)
* cp.update(rdd1)
* rdd1.count();
* // persisted: rdd1
* cp.update(rdd2)
* rdd2.count();
* // persisted: rdd1, rdd2
* // checkpointed: rdd2
* cp.update(rdd3)
* rdd3.count();
* // persisted: rdd1, rdd2, rdd3
* // checkpointed: rdd2 rdd3
* cp.update(rdd4)
* rdd4.count();
* // persisted: rdd2, rdd3, rdd4
* // checkpointed: rdd4
* cp.update(rdd5)
* rdd5.count();
* // persisted: rdd3, rdd4, rdd5
* // checkpointed: rdd4 rdd5
* }}}

The checkpointed value does not make sense for rdd3.count() and rdd5.count(). I have crossed out the existing value and included the one I think makes sense. Is my understanding incorrect or is it a bug in documentation.

Regards
Neerav