[SS][2.4.4] Confused with "WatermarkTracker: Event time watermark didn't move"?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[SS][2.4.4] Confused with "WatermarkTracker: Event time watermark didn't move"?

Jacek Laskowski
Hi,

I haven't spent much time on it, but the following DEBUG message from WatermarkTracker sparked my interest :)

I ran a streaming aggregation in Append mode and got the messages:

19/10/08 10:48:56 DEBUG WatermarkTracker: Observed event time stats 0: EventTimeStats(15000,1000,8000.0,2)
19/10/08 10:48:56 INFO WatermarkTracker: Updating event-time watermark from 0 to 5000 ms
19/10/08 10:48:56 DEBUG WatermarkTracker: Event time watermark didn't move: 5000 < 5000

I think the DEBUG message "Event time watermark didn't move" seems incorrect given that the query has just started and "Observed event time stats". It's true that the event-time watermark didn't move if it was 5000 before, but it was not as it has just started from scratch (no checkpointed state).

Can anyone shed some light on this? I'll be digging deeper in a bit, but am hoping to get some more info before. Thanks!

Pozdrawiam,
Jacek Laskowski
----
The Internals of Spark SQL https://bit.ly/spark-sql-internals
The Internals of Spark Structured Streaming https://bit.ly/spark-structured-streaming
The Internals of Apache Kafka https://bit.ly/apache-kafka-internals
Reply | Threaded
Open this post in threaded view
|

Re: [SS][2.4.4] Confused with "WatermarkTracker: Event time watermark didn't move"?

Jungtaek Lim-2
It might be due to empty batch (activated when there're stateful operator(s) and the previous batch advances watermark), which has no input so no moving watermark.

Did you only turn on DEBUG for WatermarkTracker? If you turn on DEBUG for MicroBatchExecution as well, it would log "Completed batch <batchId>" so if I'm not missing, it should be logged between updating event-time watermark and watermark didn't move. You can attach streaming query listener and get more information about batches.

Thanks,
Jungtaek Lim (HeartSaVioR)

On Tue, Oct 8, 2019 at 6:12 PM Jacek Laskowski <[hidden email]> wrote:
Hi,

I haven't spent much time on it, but the following DEBUG message from WatermarkTracker sparked my interest :)

I ran a streaming aggregation in Append mode and got the messages:

19/10/08 10:48:56 DEBUG WatermarkTracker: Observed event time stats 0: EventTimeStats(15000,1000,8000.0,2)
19/10/08 10:48:56 INFO WatermarkTracker: Updating event-time watermark from 0 to 5000 ms
19/10/08 10:48:56 DEBUG WatermarkTracker: Event time watermark didn't move: 5000 < 5000

I think the DEBUG message "Event time watermark didn't move" seems incorrect given that the query has just started and "Observed event time stats". It's true that the event-time watermark didn't move if it was 5000 before, but it was not as it has just started from scratch (no checkpointed state).

Can anyone shed some light on this? I'll be digging deeper in a bit, but am hoping to get some more info before. Thanks!

Pozdrawiam,
Jacek Laskowski
----
The Internals of Spark SQL https://bit.ly/spark-sql-internals
The Internals of Spark Structured Streaming https://bit.ly/spark-structured-streaming
The Internals of Apache Kafka https://bit.ly/apache-kafka-internals
Reply | Threaded
Open this post in threaded view
|

Re: [SS][2.4.4] Confused with "WatermarkTracker: Event time watermark didn't move"?

Jacek Laskowski
Hi HeartSaVioR,

> It might be due to empty batch

Yeah...that's my understanding too. It's for a streaming aggregation in Append output mode so that's possible. I'll have a closer look at it.

Thanks much for keeping up with this and the other questions. Much appreciated!

Pozdrawiam,
Jacek Laskowski
----
The Internals of Spark SQL https://bit.ly/spark-sql-internals
The Internals of Spark Structured Streaming https://bit.ly/spark-structured-streaming
The Internals of Apache Kafka https://bit.ly/apache-kafka-internals


On Mon, Oct 14, 2019 at 12:42 AM Jungtaek Lim <[hidden email]> wrote:
It might be due to empty batch (activated when there're stateful operator(s) and the previous batch advances watermark), which has no input so no moving watermark.

Did you only turn on DEBUG for WatermarkTracker? If you turn on DEBUG for MicroBatchExecution as well, it would log "Completed batch <batchId>" so if I'm not missing, it should be logged between updating event-time watermark and watermark didn't move. You can attach streaming query listener and get more information about batches.

Thanks,
Jungtaek Lim (HeartSaVioR)

On Tue, Oct 8, 2019 at 6:12 PM Jacek Laskowski <[hidden email]> wrote:
Hi,

I haven't spent much time on it, but the following DEBUG message from WatermarkTracker sparked my interest :)

I ran a streaming aggregation in Append mode and got the messages:

19/10/08 10:48:56 DEBUG WatermarkTracker: Observed event time stats 0: EventTimeStats(15000,1000,8000.0,2)
19/10/08 10:48:56 INFO WatermarkTracker: Updating event-time watermark from 0 to 5000 ms
19/10/08 10:48:56 DEBUG WatermarkTracker: Event time watermark didn't move: 5000 < 5000

I think the DEBUG message "Event time watermark didn't move" seems incorrect given that the query has just started and "Observed event time stats". It's true that the event-time watermark didn't move if it was 5000 before, but it was not as it has just started from scratch (no checkpointed state).

Can anyone shed some light on this? I'll be digging deeper in a bit, but am hoping to get some more info before. Thanks!

Pozdrawiam,
Jacek Laskowski
----
The Internals of Spark SQL https://bit.ly/spark-sql-internals
The Internals of Spark Structured Streaming https://bit.ly/spark-structured-streaming
The Internals of Apache Kafka https://bit.ly/apache-kafka-internals