[SS] number of output rows metric for streaming aggregation (StateStoreSaveExec) in Append output mode not measured?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[SS] number of output rows metric for streaming aggregation (StateStoreSaveExec) in Append output mode not measured?

Jacek Laskowski
Hi,

I use Spark 2.4.4

I've just noticed that the number of output rows metric of StateStoreSaveExec physical operator does not seem to be measured for Append output mode. In other words, whatever happens before or after StateStoreSaveExec operator the metric is always 0.

It is measured for the other modes - Complete and Update.


Is this intentional? Why?

Pozdrawiam,
Jacek Laskowski
----
The Internals of Spark SQL https://bit.ly/spark-sql-internals
The Internals of Spark Structured Streaming https://bit.ly/spark-structured-streaming
The Internals of Apache Kafka https://bit.ly/apache-kafka-internals
Reply | Threaded
Open this post in threaded view
|

Re: [SS] number of output rows metric for streaming aggregation (StateStoreSaveExec) in Append output mode not measured?

Jungtaek Lim-2
Thanks for reporting. That might be possible it could be intentionally excluded as it could cause some confusion before introducing empty batch (given output rows are irrelevant to the input rows in current batch), but given we have empty batch I'm not seeing the reason why we don't deal with it. I'll file and submit a patch.

Btw, there's a metric bug with empty batch as well - see SPARK-29314 [1] which I've submitted a patch recently.

Thanks,
Jungtaek Lim (HeartSaVioR)



On Sun, Oct 13, 2019 at 1:12 AM Jacek Laskowski <[hidden email]> wrote:
Hi,

I use Spark 2.4.4

I've just noticed that the number of output rows metric of StateStoreSaveExec physical operator does not seem to be measured for Append output mode. In other words, whatever happens before or after StateStoreSaveExec operator the metric is always 0.

It is measured for the other modes - Complete and Update.


Is this intentional? Why?

Pozdrawiam,
Jacek Laskowski
----
The Internals of Spark SQL https://bit.ly/spark-sql-internals
The Internals of Spark Structured Streaming https://bit.ly/spark-structured-streaming
The Internals of Apache Kafka https://bit.ly/apache-kafka-internals
Reply | Threaded
Open this post in threaded view
|

Re: [SS] number of output rows metric for streaming aggregation (StateStoreSaveExec) in Append output mode not measured?

Jungtaek Lim-2
Filed SPARK-29450 [1] and raised a patch [2]. Please let me know if you would like to be assigned as a reporter of SPARK-29450.


On Sun, Oct 13, 2019 at 4:06 PM Jungtaek Lim <[hidden email]> wrote:
Thanks for reporting. That might be possible it could be intentionally excluded as it could cause some confusion before introducing empty batch (given output rows are irrelevant to the input rows in current batch), but given we have empty batch I'm not seeing the reason why we don't deal with it. I'll file and submit a patch.

Btw, there's a metric bug with empty batch as well - see SPARK-29314 [1] which I've submitted a patch recently.

Thanks,
Jungtaek Lim (HeartSaVioR)



On Sun, Oct 13, 2019 at 1:12 AM Jacek Laskowski <[hidden email]> wrote:
Hi,

I use Spark 2.4.4

I've just noticed that the number of output rows metric of StateStoreSaveExec physical operator does not seem to be measured for Append output mode. In other words, whatever happens before or after StateStoreSaveExec operator the metric is always 0.

It is measured for the other modes - Complete and Update.


Is this intentional? Why?

Pozdrawiam,
Jacek Laskowski
----
The Internals of Spark SQL https://bit.ly/spark-sql-internals
The Internals of Spark Structured Streaming https://bit.ly/spark-structured-streaming
The Internals of Apache Kafka https://bit.ly/apache-kafka-internals
Reply | Threaded
Open this post in threaded view
|

Re: [SS] number of output rows metric for streaming aggregation (StateStoreSaveExec) in Append output mode not measured?

Jacek Laskowski
Hi, 

That was really quick! #impressed

Thanks HeartSaVioR for such prompt response! I'm fine with the current state of the issue = no need to change anything. Whatever makes Spark more shiny WFM! 

Jacek

On Sun, 13 Oct 2019, 09:19 Jungtaek Lim, <[hidden email]> wrote:
Filed SPARK-29450 [1] and raised a patch [2]. Please let me know if you would like to be assigned as a reporter of SPARK-29450.


On Sun, Oct 13, 2019 at 4:06 PM Jungtaek Lim <[hidden email]> wrote:
Thanks for reporting. That might be possible it could be intentionally excluded as it could cause some confusion before introducing empty batch (given output rows are irrelevant to the input rows in current batch), but given we have empty batch I'm not seeing the reason why we don't deal with it. I'll file and submit a patch.

Btw, there's a metric bug with empty batch as well - see SPARK-29314 [1] which I've submitted a patch recently.

Thanks,
Jungtaek Lim (HeartSaVioR)



On Sun, Oct 13, 2019 at 1:12 AM Jacek Laskowski <[hidden email]> wrote:
Hi,

I use Spark 2.4.4

I've just noticed that the number of output rows metric of StateStoreSaveExec physical operator does not seem to be measured for Append output mode. In other words, whatever happens before or after StateStoreSaveExec operator the metric is always 0.

It is measured for the other modes - Complete and Update.


Is this intentional? Why?

Pozdrawiam,
Jacek Laskowski
----
The Internals of Spark SQL https://bit.ly/spark-sql-internals
The Internals of Spark Structured Streaming https://bit.ly/spark-structured-streaming
The Internals of Apache Kafka https://bit.ly/apache-kafka-internals