[SS] Bug in StreamExecution? currentBatchId and getBatchDescriptionString for web UI

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[SS] Bug in StreamExecution? currentBatchId and getBatchDescriptionString for web UI

Jacek Laskowski
Hi,

While reviewing StreamExecution and how batches are displayed in web
UI, I've noticed that currentBatchId is -1 when StreamExecution is
created [1] and becomes 0 when no offsets are available [2].

That leads to my question about setting the job description for a
query using getBatchDescriptionString [3]. It branches per
currentBatchId and when it's -1 gives "init" [4] which never happens
as showed above.

That leads to the PR for SPARK-20464 "Add a job group and description
for streaming queries and fix cancellation of running jobs using the
job group" that sets the job description after populateStartOffsets
[5].

Shouldn't it be before populateStartOffsets so
getBatchDescriptionString has a chance of giving "init" and we see no
two 0s?

Help appreciated.

[1] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L116
[2] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala?utf8=%E2%9C%93#L516
[3] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala?utf8=%E2%9C%93#L878-L883
[4] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala?utf8=%E2%9C%93#L879
[5] https://github.com/apache/spark/commit/6fc6cf88d871f5b05b0ad1a504e0d6213cf9d331#diff-6532dd3b63bdab0364fbcf2303e290e4R294

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Spark Structured Streaming (Apache Spark 2.2+)
https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [SS] Bug in StreamExecution? currentBatchId and getBatchDescriptionString for web UI

Jacek Laskowski
Hi,

Please disregard my finding. It does not seem a bug, but just a small
"dead code" as "init" will never be displayed in web UI = the minimum
batch id can ever be 0 and so getBatchDescriptionString could be a
little "improved".

Sorry for the noise.

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Spark Structured Streaming (Apache Spark 2.2+)
https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Sat, Sep 9, 2017 at 9:21 PM, Jacek Laskowski <[hidden email]> wrote:

> Hi,
>
> While reviewing StreamExecution and how batches are displayed in web
> UI, I've noticed that currentBatchId is -1 when StreamExecution is
> created [1] and becomes 0 when no offsets are available [2].
>
> That leads to my question about setting the job description for a
> query using getBatchDescriptionString [3]. It branches per
> currentBatchId and when it's -1 gives "init" [4] which never happens
> as showed above.
>
> That leads to the PR for SPARK-20464 "Add a job group and description
> for streaming queries and fix cancellation of running jobs using the
> job group" that sets the job description after populateStartOffsets
> [5].
>
> Shouldn't it be before populateStartOffsets so
> getBatchDescriptionString has a chance of giving "init" and we see no
> two 0s?
>
> Help appreciated.
>
> [1] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L116
> [2] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala?utf8=%E2%9C%93#L516
> [3] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala?utf8=%E2%9C%93#L878-L883
> [4] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala?utf8=%E2%9C%93#L879
> [5] https://github.com/apache/spark/commit/6fc6cf88d871f5b05b0ad1a504e0d6213cf9d331#diff-6532dd3b63bdab0364fbcf2303e290e4R294
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> Spark Structured Streaming (Apache Spark 2.2+)
> https://bit.ly/spark-structured-streaming
> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]