understanding the plans of spark sql

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

understanding the plans of spark sql

asma zgolli
Hello,

I'm executing using spark SQL an SQL workload on data stored in MongoDB.

i have a question about the locality of execution of the aggregation. I m wondering if the aggregation is pushed down to MongoDB (like pushing down filters and projection) or executed in spark. I m displaying the physical plan in spark, this plan includes hashaggregation operators but in the log of my MongoDB server the execution plan has pipelines for aggregation. 

I am really confused. thank you very much for your answers. 
yours sincerely 
Asma ZGOLLI

PhD student in data engineering - computer science
email alt:  [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: understanding the plans of spark sql

rxin
This is more of a question for the connector. It depends on how the connector is implemented. Some implements aggregate pushdown, but most don't.


On Mon, Mar 18, 2019 at 10:05 AM, asma zgolli <[hidden email]> wrote:
Hello,

I'm executing using spark SQL an SQL workload on data stored in MongoDB.

i have a question about the locality of execution of the aggregation. I m wondering if the aggregation is pushed down to MongoDB (like pushing down filters and projection) or executed in spark. I m displaying the physical plan in spark, this plan includes hashaggregation operators but in the log of my MongoDB server the execution plan has pipelines for aggregation. 

I am really confused. thank you very much for your answers. 
yours sincerely 
Asma ZGOLLI

PhD student in data engineering - computer science
email alt:  [hidden email]