Behavior of checkpointLocation from options vs setting conf spark.sql.streaming.checkpointLocation

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Behavior of checkpointLocation from options vs setting conf spark.sql.streaming.checkpointLocation

Shubham Chaurasia
Hi,

I would like to confirm checkpointing behavior, I have observed following scenarios:

1) When I set checkpointLocation from streaming query like:

val query = rateDF.writeStream.format("console").outputMode("append").trigger(Trigger.ProcessingTime("1 seconds")).option("checkpointLocation", "/Users/shubham/checkpoint_from_query1").queryName("q2").start

It generates all the metadata in /Users/shubham/checkpoint_from_query1 regardless of whether queryName is set or not.

2) When I set it from conf like:  spark.conf.set("spark.sql.streaming.checkpointLocation", "/Users/shubham/checkpoint_from_conf")

I observed two cases here:
2.1) When I set the queryName like .queryName("q2"), it generates all metadata under /Users/shubham/checkpoint_from_conf/q2

2.2) When queryName is not set, it generates all metadata under   /Users/shubham/checkpoint_from_conf/<some-random-uuid>
 
I have seen query successfully recovers in scenario 1) and 2.1) which is fine. 
It does not recover from  2.2) which is also fine as it is unable to somehow get the query handle.

Can there be any other possibility? Would like to confirm.

Thanks,
Shubham

Reply | Threaded
Open this post in threaded view
|

Re: Behavior of checkpointLocation from options vs setting conf spark.sql.streaming.checkpointLocation

Gabor Somogyi
Hi Shubham,

I've just checked the latest master branch and I can confirm it works as you've described.
As a workaround one can read the <some-random-uuid> in the directory structure and can be set with .queryName("<some-random-uuid>") before restart.

BR,
G


On Tue, Dec 11, 2018 at 6:45 AM Shubham Chaurasia <[hidden email]> wrote:
Hi,

I would like to confirm checkpointing behavior, I have observed following scenarios:

1) When I set checkpointLocation from streaming query like:

val query = rateDF.writeStream.format("console").outputMode("append").trigger(Trigger.ProcessingTime("1 seconds")).option("checkpointLocation", "/Users/shubham/checkpoint_from_query1").queryName("q2").start

It generates all the metadata in /Users/shubham/checkpoint_from_query1 regardless of whether queryName is set or not.

2) When I set it from conf like:  spark.conf.set("spark.sql.streaming.checkpointLocation", "/Users/shubham/checkpoint_from_conf")

I observed two cases here:
2.1) When I set the queryName like .queryName("q2"), it generates all metadata under /Users/shubham/checkpoint_from_conf/q2

2.2) When queryName is not set, it generates all metadata under   /Users/shubham/checkpoint_from_conf/<some-random-uuid>
 
I have seen query successfully recovers in scenario 1) and 2.1) which is fine. 
It does not recover from  2.2) which is also fine as it is unable to somehow get the query handle.

Can there be any other possibility? Would like to confirm.

Thanks,
Shubham

Reply | Threaded
Open this post in threaded view
|

Re: Behavior of checkpointLocation from options vs setting conf spark.sql.streaming.checkpointLocation

Shubham Chaurasia
Thanks Gabor.

On Wed, Dec 12, 2018, 4:06 PM Gabor Somogyi <[hidden email] wrote:
Hi Shubham,

I've just checked the latest master branch and I can confirm it works as you've described.
As a workaround one can read the <some-random-uuid> in the directory structure and can be set with .queryName("<some-random-uuid>") before restart.

BR,
G


On Tue, Dec 11, 2018 at 6:45 AM Shubham Chaurasia <[hidden email]> wrote:
Hi,

I would like to confirm checkpointing behavior, I have observed following scenarios:

1) When I set checkpointLocation from streaming query like:

val query = rateDF.writeStream.format("console").outputMode("append").trigger(Trigger.ProcessingTime("1 seconds")).option("checkpointLocation", "/Users/shubham/checkpoint_from_query1").queryName("q2").start

It generates all the metadata in /Users/shubham/checkpoint_from_query1 regardless of whether queryName is set or not.

2) When I set it from conf like:  spark.conf.set("spark.sql.streaming.checkpointLocation", "/Users/shubham/checkpoint_from_conf")

I observed two cases here:
2.1) When I set the queryName like .queryName("q2"), it generates all metadata under /Users/shubham/checkpoint_from_conf/q2

2.2) When queryName is not set, it generates all metadata under   /Users/shubham/checkpoint_from_conf/<some-random-uuid>
 
I have seen query successfully recovers in scenario 1) and 2.1) which is fine. 
It does not recover from  2.2) which is also fine as it is unable to somehow get the query handle.

Can there be any other possibility? Would like to confirm.

Thanks,
Shubham