Quantcast

Lineage between Datasets

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Lineage between Datasets

Chang Chen
Hi All

I believe that there is no lineage between datasets. Consider this case:

val people = spark.read.parquet("...").as[Person]
val ageGreatThan30 = people.filter("age > 30")
Since the second DS can push down the condition, they are obviously different logical plans and hence are different physical plan. 

What I understanding is right?

Thanks
Chang  
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Lineage between Datasets

rxin
The physical plans are not subtrees, but the analyzed plan (before the optimizer runs) is actually similar to "lineage". You can get that by calling explain(true) and look at the analyzed plan. 


On Wed, Apr 12, 2017 at 3:03 AM Chang Chen <[hidden email]> wrote:
Hi All

I believe that there is no lineage between datasets. Consider this case:

val people = spark.read.parquet("...").as[Person]
val ageGreatThan30 = people.filter("age > 30")
Since the second DS can push down the condition, they are obviously different logical plans and hence are different physical plan. 

What I understanding is right?

Thanks
Chang  
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Lineage between Datasets

Chang Chen
Does it mean any two Datasets's physical plans are independent?

Thanks
Chang

On Thu, Apr 13, 2017 at 12:53 AM, Reynold Xin <[hidden email]> wrote:
The physical plans are not subtrees, but the analyzed plan (before the optimizer runs) is actually similar to "lineage". You can get that by calling explain(true) and look at the analyzed plan. 


On Wed, Apr 12, 2017 at 3:03 AM Chang Chen <[hidden email]> wrote:
Hi All

I believe that there is no lineage between datasets. Consider this case:

val people = spark.read.parquet("...").as[Person]
val ageGreatThan30 = people.filter("age > 30")
Since the second DS can push down the condition, they are obviously different logical plans and hence are different physical plan. 

What I understanding is right?

Thanks
Chang  

Loading...