[SS] Why EventTimeStatsAccum for event-time watermark not a named accumulator?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[SS] Why EventTimeStatsAccum for event-time watermark not a named accumulator?

Jacek Laskowski
Hi,

I'm curious why EventTimeStatsAccum is not a named accumulator (not to mention SQLMetric) so the event-time watermark could be monitored in web UI?

I've changed the code for EventTimeWatermarkExec physical operator to register EventTimeStatsAccum as a named accumulator and the values are properly propagated back to the driver and the web UI. It seems to be working fine (and it's just a one-day coding).

It went fairly easy to have a very initial prototype so I'm wondering why it's not been included? Has this been considered? Should I file a JIRA ticket and send a pull request for review? Please guide as I found it very helpful (and surprisingly easy to implement so I'm worried I'm missing something important). Thanks.

Pozdrawiam,
Jacek Laskowski
----
The Internals of Spark SQL https://bit.ly/spark-sql-internals
The Internals of Spark Structured Streaming https://bit.ly/spark-structured-streaming
The Internals of Apache Kafka https://bit.ly/apache-kafka-internals
Reply | Threaded
Open this post in threaded view
|

Re: [SS] Why EventTimeStatsAccum for event-time watermark not a named accumulator?

Jacek Laskowski
Hi,

After some thinking about it, I may have found out the reason why not to expose EventTimeStatsAccum as a named accumulator. The reason is that it's an internal part of how event-time watermark works and should not be exposed via web UI as much as if it was part of a Spark app (the web UI is meant for).

With that being said, I'm wondering why is EventTimeStatsAccum not a SQL metric then? With that, it'd be in web UI, but just in the physical plan of a streaming query.

WDYT?

Pozdrawiam,
Jacek Laskowski
----
The Internals of Spark SQL https://bit.ly/spark-sql-internals
The Internals of Spark Structured Streaming https://bit.ly/spark-structured-streaming
The Internals of Apache Kafka https://bit.ly/apache-kafka-internals


On Mon, Jun 10, 2019 at 8:59 PM Jacek Laskowski <[hidden email]> wrote:
Hi,

I'm curious why EventTimeStatsAccum is not a named accumulator (not to mention SQLMetric) so the event-time watermark could be monitored in web UI?

I've changed the code for EventTimeWatermarkExec physical operator to register EventTimeStatsAccum as a named accumulator and the values are properly propagated back to the driver and the web UI. It seems to be working fine (and it's just a one-day coding).

It went fairly easy to have a very initial prototype so I'm wondering why it's not been included? Has this been considered? Should I file a JIRA ticket and send a pull request for review? Please guide as I found it very helpful (and surprisingly easy to implement so I'm worried I'm missing something important). Thanks.

Pozdrawiam,
Jacek Laskowski
----
The Internals of Spark SQL https://bit.ly/spark-sql-internals
The Internals of Spark Structured Streaming https://bit.ly/spark-structured-streaming
The Internals of Apache Kafka https://bit.ly/apache-kafka-internals