Determine global watermark via StreamingQueryProgress eventTime watermark String

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Determine global watermark via StreamingQueryProgress eventTime watermark String

dwichman
Hi Spark Developers,

Is it possible to reliably determine the current global watermark that is
being used in a streaming query via StreamingQueryProgress.onQueryProgress
eventTime watermark String?

https://spark.apache.org/docs/2.2.1/api/java/org/apache/spark/sql/streaming/StreamingQueryProgress.html 

The intention would be to precede the query watermark function with
something like a map function and compare event times with the assumed
global watermark to determine if the event will be dropped (i.e. too late).

If StreamingQueryProgress.onQueryProgress eventTime watermark does not
accurately reflect the current global watermark, is there another way to
reliably determine it?

Thanks for your help.

-Derek



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Determine global watermark via StreamingQueryProgress eventTime watermark String

Jungtaek Lim-2
There was a similar question (but another approach) and I've explained the current status a bit.


I guess this would also answer your question as well. At least for now, Spark doesn't expose the current watermark in specific micro-batch to the user level. It's abstracted away. I'm not sure knowing the exact global watermark "outside" of the query would be able to affect the running query.

If there's a strong demand, we could probably consider adding some function which provides the current watermark. I guess producing dropped events via side-output is something we are in favor of (if that is not quite hard to do), more than exposing the current watermark and letting users do that instead.


On Wed, Mar 17, 2021 at 1:20 AM dwichman <[hidden email]> wrote:
Hi Spark Developers,

Is it possible to reliably determine the current global watermark that is
being used in a streaming query via StreamingQueryProgress.onQueryProgress
eventTime watermark String?

https://spark.apache.org/docs/2.2.1/api/java/org/apache/spark/sql/streaming/StreamingQueryProgress.html

The intention would be to precede the query watermark function with
something like a map function and compare event times with the assumed
global watermark to determine if the event will be dropped (i.e. too late).

If StreamingQueryProgress.onQueryProgress eventTime watermark does not
accurately reflect the current global watermark, is there another way to
reliably determine it?

Thanks for your help.

-Derek



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]