[SS] Why does ConsoleSink's addBatch convert input DataFrame to show it?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[SS] Why does ConsoleSink's addBatch convert input DataFrame to show it?

Jacek Laskowski
Hi,

Just noticed that the input DataFrame is collect'ed and then
parallelize'd simply to show it to the console [1]. Why so many fairly
expensive operations for show?

I'd appreciate some help understanding this code. Thanks.

[1] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/console.scala#L51-L53

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]