# Lineage between Datasets

3 messages
Open this post in threaded view
|
Report Content as Inappropriate

## Lineage between Datasets

 Hi AllI believe that there is no lineage between datasets. Consider this case:``val people = spark.read.parquet("...").as[Person]````val ageGreatThan30 = `people.filter("age > 30")`Since the second DS can push down the condition, they are obviously different logical plans and hence are different physical plan. What I understanding is right?ThanksChang
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Lineage between Datasets

 The physical plans are not subtrees, but the analyzed plan (before the optimizer runs) is actually similar to "lineage". You can get that by calling explain(true) and look at the analyzed plan. On Wed, Apr 12, 2017 at 3:03 AM Chang Chen <[hidden email]> wrote:Hi AllI believe that there is no lineage between datasets. Consider this case:``val people = spark.read.parquet("...").as[Person]````val ageGreatThan30 = `people.filter("age > 30")`Since the second DS can push down the condition, they are obviously different logical plans and hence are different physical plan. What I understanding is right?ThanksChang
 Does it mean any two Datasets's physical plans are independent?ThanksChangOn Thu, Apr 13, 2017 at 12:53 AM, Reynold Xin wrote:The physical plans are not subtrees, but the analyzed plan (before the optimizer runs) is actually similar to "lineage". You can get that by calling explain(true) and look at the analyzed plan. On Wed, Apr 12, 2017 at 3:03 AM Chang Chen <[hidden email]> wrote:Hi AllI believe that there is no lineage between datasets. Consider this case:``val people = spark.read.parquet("...").as[Person]````val ageGreatThan30 = `people.filter("age > 30")`Since the second DS can push down the condition, they are obviously different logical plans and hence are different physical plan. What I understanding is right?ThanksChang