It's because of different API design.

RDD.checkpoint returns void, which means it mutates the RDD state so you need a RDD.isCheckpointed method to check if this RDD is checkpointed.

Dataset.checkpoint returns a new Dataset, which means there is no isCheckpointed state in Dataset, and thus we don't need a Dataset.isCheckpointed method.

Actually, I realized keeping the info would not be enough as I need to find back the checkpoint files to delete them :/

As far as I understand, Dataset.rdd is not the same as InternalRDD.
It is just another RDD representation of the same Dataset and is created on demand (lazy val) when Dataset.rdd is called.
This totally explains the observed behavior.

But how would would it be possible to know that a Dataset have been checkpointed?
Should I manually keep track of that info?

Hello everyone,

I have a question about checkpointing on dataset.

It seems in 2.1.0 that there is a Dataset.checkpoint(), however unlike RDD there is no Dataset.isCheckpointed().

I wonder if Dataset.checkpoint is a syntactic sugar for Dataset.rdd.checkpoint.
When I do :

Dataset.checkpoint; Dataset.count
Dataset.rdd.isCheckpointed // result: false

However, when I explicitly do:
Dataset.rdd.checkpoint; Dataset.rdd.count
Dataset.rdd.isCheckpointed // result: true

Could someone explain this behavior to me, or provide some references?

