Checkpoint and recomputation

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Checkpoint and recomputation

Li Jin
Hi dear devs,

I recently came across checkpoint functionality in Spark and found (a little surprising) that checkpoint causes the DataFrame to be computed twice unless cache is called before checkpoint.

My guess is that this is probably hard to fix and/or maybe checkpoint feature is not very frequently used?