cannot call explain or show on dataframe in structured streaming addBatch dataframe

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

cannot call explain or show on dataframe in structured streaming addBatch dataframe

assaf.mendelson

Hi all,

I am playing around with structured streaming and looked at the code for ConsoleSink.

 

I see the code has:

 

data.sparkSession.createDataFrame(
    data.sparkSession.sparkContext.parallelize(data.collect()), data.schema)
    .show(
numRowsToShow, isTruncated)
}

 

I was wondering why it does not do data directly? Why the collect and parallelize?

 

 

Thanks,

              Assaf.

 

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: cannot call explain or show on dataframe in structured streaming addBatch dataframe

Michael Armbrust
There is a little bit of weirdness to how we override the default query planner to replace it with an incrementalizing planner.  As such, calling any operation that changes the query plan (such as a LIMIT) would cause it to revert to the batch planner and return the wrong answer.  We should fix this before the finalize the Sink API.

On Mon, Jun 19, 2017 at 9:32 AM, assaf.mendelson <[hidden email]> wrote:

Hi all,

I am playing around with structured streaming and looked at the code for ConsoleSink.

 

I see the code has:

 

data.sparkSession.createDataFrame(
    data.sparkSession.sparkContext.parallelize(data.collect()), data.schema)
    .show(
numRowsToShow, isTruncated)
}

 

I was wondering why it does not do data directly? Why the collect and parallelize?

 

 

Thanks,

              Assaf.

 



View this message in context: cannot call explain or show on dataframe in structured streaming addBatch dataframe
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Loading...