I have a large scale job that for certain size of input breaks so I am trying to play with checkpointing to split the DAG and understand the problematic point. I have some questions about checkpointing:
What is the utility of non-eager checkpointing?
How checkpointing is different than manually write a dataframe (or rdd) to hdfs? Also, doing that will allow to re-read the stored dataframe, while with chekpointing I don't see a simple way of re-reading them in a future job
I read that checkpointing is different than persisting because the lineage is not stored, but I don't understand why persisting stores the lineage. The point of persisting is that next computation will start from the persisted data (either mem or mem+disk), so what is the advantage of having the lineage available? Am I missing some basic understanding of these 2 apparently different operations?