SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code

Kazuaki Ishizaki
Hello community,

I am writing this e-mail in order to start a discussion about adding structure intermediate representation for generating Java code from a program using DataFrame or Dataset API, in addition to the current String-based representation.
This addition is based on the discussions in a thread at https://github.com/apache/spark/pull/21537#issuecomment-413268196

Please feel free to comment on the JIRA ticket or Google Doc.

JIRA ticket: https://issues.apache.org/jira/browse/SPARK-25728
Google Doc: https://docs.google.com/document/d/1Jzf56bxpMpSwsGV_hSzl9wQG22hyI731McQcjognqxY/edit?usp=sharing

Looking forward to hear your feedback

Best Regards,
Kazuaki Ishizaki
Reply | Threaded
Open this post in threaded view
|

Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code

Takeshi Yamamuro
Hi, ishizaki-san,

Cool activity, I left some comments on the doc.

best,
takeshi


On Mon, Oct 15, 2018 at 12:05 AM Kazuaki Ishizaki <[hidden email]> wrote:
Hello community,

I am writing this e-mail in order to start a discussion about adding structure intermediate representation for generating Java code from a program using DataFrame or Dataset API, in addition to the current String-based representation.
This addition is based on the discussions in a thread at https://github.com/apache/spark/pull/21537#issuecomment-413268196

Please feel free to comment on the JIRA ticket or Google Doc.

JIRA ticket: https://issues.apache.org/jira/browse/SPARK-25728
Google Doc: https://docs.google.com/document/d/1Jzf56bxpMpSwsGV_hSzl9wQG22hyI731McQcjognqxY/edit?usp=sharing

Looking forward to hear your feedback

Best Regards,
Kazuaki Ishizaki


--
---
Takeshi Yamamuro
Reply | Threaded
Open this post in threaded view
|

Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code

Kazuaki Ishizaki
Hi Yamamuro-san,
Thank you for your comments. This SPIP gets several valuable comments and feedback on Google Doc: https://docs.google.com/document/d/1Jzf56bxpMpSwsGV_hSzl9wQG22hyI731McQcjognqxY/edit?usp=sharing.
I hope that this SPIP could go forward based on these feedback.

Based on this SPIP procedure http://spark.apache.org/improvement-proposals.html, can I ask one or more PMCs to become a shepherd of this SPIP?
I would appreciate your kindness and cooperation.

Best Regards,
Kazuaki Ishizaki



From:        Takeshi Yamamuro <[hidden email]>
To:        Spark dev list <[hidden email]>
Cc:        [hidden email]
Date:        2018/10/15 12:12
Subject:        Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code




Hi, ishizaki-san,

Cool activity, I left some comments on the doc.

best,
takeshi


On Mon, Oct 15, 2018 at 12:05 AM Kazuaki Ishizaki <[hidden email]> wrote:
Hello community,

I am writing this e-mail in order to start a discussion about adding structure intermediate representation for generating Java code from a program using DataFrame or Dataset API, in addition to the current String-based representation.
This addition is based on the discussions in a thread at
https://github.com/apache/spark/pull/21537#issuecomment-413268196

Please feel free to comment on the JIRA ticket or Google Doc.


JIRA ticket:
https://issues.apache.org/jira/browse/SPARK-25728
Google Doc:
https://docs.google.com/document/d/1Jzf56bxpMpSwsGV_hSzl9wQG22hyI731McQcjognqxY/edit?usp=sharing

Looking forward to hear your feedback


Best Regards,
Kazuaki Ishizaki



--
---
Takeshi Yamamuro


Reply | Threaded
Open this post in threaded view
|

Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code

Xiao Li-2
Hi, Kazuaki, 

Thanks for your great SPIP! I am willing to be the shepherd of this SPIP. 

Cheers,

Xiao


On Mon, Oct 22, 2018 at 12:05 AM Kazuaki Ishizaki <[hidden email]> wrote:
Hi Yamamuro-san,
Thank you for your comments. This SPIP gets several valuable comments and feedback on Google Doc: https://docs.google.com/document/d/1Jzf56bxpMpSwsGV_hSzl9wQG22hyI731McQcjognqxY/edit?usp=sharing.
I hope that this SPIP could go forward based on these feedback.

Based on this SPIP procedure http://spark.apache.org/improvement-proposals.html, can I ask one or more PMCs to become a shepherd of this SPIP?
I would appreciate your kindness and cooperation.

Best Regards,
Kazuaki Ishizaki



From:        Takeshi Yamamuro <[hidden email]>
To:        Spark dev list <[hidden email]>
Cc:        [hidden email]
Date:        2018/10/15 12:12
Subject:        Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code




Hi, ishizaki-san,

Cool activity, I left some comments on the doc.

best,
takeshi


On Mon, Oct 15, 2018 at 12:05 AM Kazuaki Ishizaki <[hidden email]> wrote:
Hello community,

I am writing this e-mail in order to start a discussion about adding structure intermediate representation for generating Java code from a program using DataFrame or Dataset API, in addition to the current String-based representation.
This addition is based on the discussions in a thread at
https://github.com/apache/spark/pull/21537#issuecomment-413268196

Please feel free to comment on the JIRA ticket or Google Doc.


JIRA ticket:
https://issues.apache.org/jira/browse/SPARK-25728
Google Doc:
https://docs.google.com/document/d/1Jzf56bxpMpSwsGV_hSzl9wQG22hyI731McQcjognqxY/edit?usp=sharing

Looking forward to hear your feedback


Best Regards,
Kazuaki Ishizaki



--
---
Takeshi Yamamuro




--
Spark+AI Summit North America 2019
Reply | Threaded
Open this post in threaded view
|

Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code

Kazuaki Ishizaki
Hi Xiao,
Thank you very much for becoming a shepherd.
If you feel the discussion settles, we would appreciate it if you would start a voting.

Regards,
Kazuaki Ishizaki



From:        Xiao Li <[hidden email]>
To:        Kazuaki Ishizaki <[hidden email]>
Cc:        dev <[hidden email]>, Takeshi Yamamuro <[hidden email]>
Date:        2018/10/22 16:31
Subject:        Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code




Hi, Kazuaki, 

Thanks for your great SPIP! I am willing to be the shepherd of this SPIP. 

Cheers,

Xiao


On Mon, Oct 22, 2018 at 12:05 AM Kazuaki Ishizaki <[hidden email]> wrote:
Hi Yamamuro-san,
Thank you for your comments. This SPIP gets several valuable comments and feedback on Google Doc:
https://docs.google.com/document/d/1Jzf56bxpMpSwsGV_hSzl9wQG22hyI731McQcjognqxY/edit?usp=sharing.
I hope that this SPIP could go forward based on these feedback.


Based on this SPIP procedure
http://spark.apache.org/improvement-proposals.html, can I ask one or more PMCs to become a shepherd of this SPIP?
I would appreciate your kindness and cooperation.


Best Regards,
Kazuaki Ishizaki




From:        
Takeshi Yamamuro <[hidden email]>
To:        
Spark dev list <[hidden email]>
Cc:        
[hidden email]
Date:        
2018/10/15 12:12
Subject:        
Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code




Hi, ishizaki-san,

Cool activity, I left some comments on the doc.

best,
takeshi


On Mon, Oct 15, 2018 at 12:05 AM Kazuaki Ishizaki <
[hidden email]> wrote:
Hello community,

I am writing this e-mail in order to start a discussion about adding structure intermediate representation for generating Java code from a program using DataFrame or Dataset API, in addition to the current String-based representation.
This addition is based on the discussions in a thread at
https://github.com/apache/spark/pull/21537#issuecomment-413268196

Please feel free to comment on the JIRA ticket or Google Doc.

JIRA ticket:
https://issues.apache.org/jira/browse/SPARK-25728
Google Doc:
https://docs.google.com/document/d/1Jzf56bxpMpSwsGV_hSzl9wQG22hyI731McQcjognqxY/edit?usp=sharing

Looking forward to hear your feedback

Best Regards,
Kazuaki Ishizaki



--
---
Takeshi Yamamuro



--


Reply | Threaded
Open this post in threaded view
|

Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code

rxin
I have some pretty serious concerns over this proposal. I agree that there are many things that can be improved, but at the same time I also think the cost of introducing a new IR in the middle is extremely high. Having participated in designing some of the IRs in other systems, I've seen more failures than successes. The failures typically come from two sources: (1) in general it is extremely difficult to design IRs that are both expressive enough and are simple enough; (2) typically another layer of indirection increases the complexity a lot more, beyond the level of understanding and expertise that most contributors can obtain without spending years in the code base and learning about all the gotchas.

In either case, I'm not saying "no please don't do this". This is one of those cases in which the devils are in the details that cannot be captured by a high level document, and I want to explicitly express my concern here.




On Thu, Oct 25, 2018 at 12:10 AM Kazuaki Ishizaki <[hidden email]> wrote:
Hi Xiao,
Thank you very much for becoming a shepherd.
If you feel the discussion settles, we would appreciate it if you would start a voting.

Regards,
Kazuaki Ishizaki



From:        Xiao Li <[hidden email]>
To:        Kazuaki Ishizaki <[hidden email]>
Cc:        dev <[hidden email]>, Takeshi Yamamuro <[hidden email]>
Date:        2018/10/22 16:31
Subject:        Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code




Hi, Kazuaki, 

Thanks for your great SPIP! I am willing to be the shepherd of this SPIP. 

Cheers,

Xiao


On Mon, Oct 22, 2018 at 12:05 AM Kazuaki Ishizaki <[hidden email]> wrote:
Hi Yamamuro-san,
Thank you for your comments. This SPIP gets several valuable comments and feedback on Google Doc:
https://docs.google.com/document/d/1Jzf56bxpMpSwsGV_hSzl9wQG22hyI731McQcjognqxY/edit?usp=sharing.
I hope that this SPIP could go forward based on these feedback.


Based on this SPIP procedure
http://spark.apache.org/improvement-proposals.html, can I ask one or more PMCs to become a shepherd of this SPIP?
I would appreciate your kindness and cooperation.


Best Regards,
Kazuaki Ishizaki




From:        
Takeshi Yamamuro <[hidden email]>
To:        
Spark dev list <[hidden email]>
Cc:        
[hidden email]
Date:        
2018/10/15 12:12
Subject:        
Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code




Hi, ishizaki-san,

Cool activity, I left some comments on the doc.

best,
takeshi


On Mon, Oct 15, 2018 at 12:05 AM Kazuaki Ishizaki <
[hidden email]> wrote:
Hello community,

I am writing this e-mail in order to start a discussion about adding structure intermediate representation for generating Java code from a program using DataFrame or Dataset API, in addition to the current String-based representation.
This addition is based on the discussions in a thread at
https://github.com/apache/spark/pull/21537#issuecomment-413268196

Please feel free to comment on the JIRA ticket or Google Doc.

JIRA ticket:
https://issues.apache.org/jira/browse/SPARK-25728
Google Doc:
https://docs.google.com/document/d/1Jzf56bxpMpSwsGV_hSzl9wQG22hyI731McQcjognqxY/edit?usp=sharing

Looking forward to hear your feedback

Best Regards,
Kazuaki Ishizaki



--
---
Takeshi Yamamuro



--


Reply | Threaded
Open this post in threaded view
|

Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code

Kazuaki Ishizaki
Hi Reynold,
Thank you for your comments. They are great points.

1) Yes, it is not easy to design the expressive and enough IR. We can learn concepts from good examples like HyPer, Weld, and others. They are expressive and not complicated. The detail cannot be captured yet,
2) To introduce another layer takes some time to learn new things. This SPIP tries to reduce learning time to preparing clean APIs for constructing generated code. I will try to add some examples for APIs that are equivalent to current string concatenations (e.g. "a" + " * " + "b" + " / " + "c").

It is important for us to learn from failures than learn from successes. We would appreciate it if you could list up failures that you have seen.

Best Regards,
Kazuaki Ishizaki



From:        Reynold Xin <[hidden email]>
To:        Kazuaki Ishizaki <[hidden email]>
Cc:        Xiao Li <[hidden email]>, dev <[hidden email]>, Takeshi Yamamuro <[hidden email]>
Date:        2018/10/26 03:46
Subject:        Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code




I have some pretty serious concerns over this proposal. I agree that there are many things that can be improved, but at the same time I also think the cost of introducing a new IR in the middle is extremely high. Having participated in designing some of the IRs in other systems, I've seen more failures than successes. The failures typically come from two sources: (1) in general it is extremely difficult to design IRs that are both expressive enough and are simple enough; (2) typically another layer of indirection increases the complexity a lot more, beyond the level of understanding and expertise that most contributors can obtain without spending years in the code base and learning about all the gotchas.

In either case, I'm not saying "no please don't do this". This is one of those cases in which the devils are in the details that cannot be captured by a high level document, and I want to explicitly express my concern here.




On Thu, Oct 25, 2018 at 12:10 AM Kazuaki Ishizaki <[hidden email]> wrote:
Hi Xiao,
Thank you very much for becoming a shepherd.
If you feel the discussion settles, we would appreciate it if you would start a voting.


Regards,
Kazuaki Ishizaki




From:        
Xiao Li <[hidden email]>
To:        
Kazuaki Ishizaki <[hidden email]>
Cc:        
dev <[hidden email]>, Takeshi Yamamuro <[hidden email]>
Date:        
2018/10/22 16:31
Subject:        
Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code




Hi, Kazuaki, 

Thanks for your great SPIP! I am willing to be the shepherd of this SPIP. 

Cheers,

Xiao


On Mon, Oct 22, 2018 at 12:05 AM Kazuaki Ishizaki <
[hidden email]> wrote:
Hi Yamamuro-san,
Thank you for your comments. This SPIP gets several valuable comments and feedback on Google Doc:
https://docs.google.com/document/d/1Jzf56bxpMpSwsGV_hSzl9wQG22hyI731McQcjognqxY/edit?usp=sharing.
I hope that this SPIP could go forward based on these feedback.

Based on this SPIP procedure
http://spark.apache.org/improvement-proposals.html, can I ask one or more PMCs to become a shepherd of this SPIP?
I would appreciate your kindness and cooperation.

Best Regards,
Kazuaki Ishizaki




From:        
Takeshi Yamamuro <[hidden email]>
To:        
Spark dev list <[hidden email]>
Cc:        
[hidden email]
Date:        
2018/10/15 12:12
Subject:        
Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code




Hi, ishizaki-san,

Cool activity, I left some comments on the doc.

best,
takeshi


On Mon, Oct 15, 2018 at 12:05 AM Kazuaki Ishizaki <
[hidden email]> wrote:
Hello community,

I am writing this e-mail in order to start a discussion about adding structure intermediate representation for generating Java code from a program using DataFrame or Dataset API, in addition to the current String-based representation.
This addition is based on the discussions in a thread at
https://github.com/apache/spark/pull/21537#issuecomment-413268196

Please feel free to comment on the JIRA ticket or Google Doc.

JIRA ticket:
https://issues.apache.org/jira/browse/SPARK-25728
Google Doc:
https://docs.google.com/document/d/1Jzf56bxpMpSwsGV_hSzl9wQG22hyI731McQcjognqxY/edit?usp=sharing

Looking forward to hear your feedback

Best Regards,
Kazuaki Ishizaki



--
---
Takeshi Yamamuro



--




Reply | Threaded
Open this post in threaded view
|

Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code

Kazuaki Ishizaki
Hi all,
I spend some time to consider great points. Sorry for my delay.
I put comments in green into https://docs.google.com/document/d/1Jzf56bxpMpSwsGV_hSzl9wQG22hyI731McQcjognqxY/edit

Here are summary of comments:
1) For simplicity and expressiveness, introduce nodes to represent a structure (e.g. for, while)
2) For simplicity, measure some statistics (e.g. node / java bytecode, memory consumption)
3) For ease of understanding, use simple APIs like the original statements (op2, for, while, ...)

We would appreciate it if you put any comments/suggestions on GoogleDoc/dev-ml for going forward.

Kazuaki Ishizaki,



From:        "Kazuaki Ishizaki" <[hidden email]>
To:        Reynold Xin <[hidden email]>
Cc:        dev <[hidden email]>, Takeshi Yamamuro <[hidden email]>, Xiao Li <[hidden email]>
Date:        2018/10/31 00:56
Subject:        Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code




Hi Reynold,
Thank you for your comments. They are great points.


1) Yes, it is not easy to design the expressive and enough IR. We can learn concepts from good examples like HyPer, Weld, and others. They are expressive and not complicated. The detail cannot be captured yet,
2) To introduce another layer takes some time to learn new things. This SPIP tries to reduce learning time to preparing clean APIs for constructing generated code. I will try to add some examples for APIs that are equivalent to current string concatenations (e.g. "a" + " * " + "b" + " / " + "c").


It is important for us to learn from failures than learn from successes. We would appreciate it if you could list up failures that you have seen.


Best Regards,
Kazuaki Ishizaki




From:        
Reynold Xin <[hidden email]>
To:        
Kazuaki Ishizaki <[hidden email]>
Cc:        
Xiao Li <[hidden email]>, dev <[hidden email]>, Takeshi Yamamuro <[hidden email]>
Date:        
2018/10/26 03:46
Subject:        
Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code




I have some pretty serious concerns over this proposal. I agree that there are many things that can be improved, but at the same time I also think the cost of introducing a new IR in the middle is extremely high. Having participated in designing some of the IRs in other systems, I've seen more failures than successes. The failures typically come from two sources: (1) in general it is extremely difficult to design IRs that are both expressive enough and are simple enough; (2) typically another layer of indirection increases the complexity a lot more, beyond the level of understanding and expertise that most contributors can obtain without spending years in the code base and learning about all the gotchas.

In either case, I'm not saying "no please don't do this". This is one of those cases in which the devils are in the details that cannot be captured by a high level document, and I want to explicitly express my concern here.




On Thu, Oct 25, 2018 at 12:10 AM Kazuaki Ishizaki <
[hidden email]> wrote:
Hi Xiao,
Thank you very much for becoming a shepherd.
If you feel the discussion settles, we would appreciate it if you would start a voting.

Regards,
Kazuaki Ishizaki




From:        
Xiao Li <[hidden email]>
To:        
Kazuaki Ishizaki <[hidden email]>
Cc:        
dev <[hidden email]>, Takeshi Yamamuro <[hidden email]>
Date:        
2018/10/22 16:31
Subject:        
Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code




Hi, Kazuaki,

Thanks for your great SPIP! I am willing to be the shepherd of this SPIP.

Cheers,

Xiao


On Mon, Oct 22, 2018 at 12:05 AM Kazuaki Ishizaki <
[hidden email]> wrote:
Hi Yamamuro-san,
Thank you for your comments. This SPIP gets several valuable comments and feedback on Google Doc:
https://docs.google.com/document/d/1Jzf56bxpMpSwsGV_hSzl9wQG22hyI731McQcjognqxY/edit?usp=sharing.
I hope that this SPIP could go forward based on these feedback.

Based on this SPIP procedure
http://spark.apache.org/improvement-proposals.html, can I ask one or more PMCs to become a shepherd of this SPIP?
I would appreciate your kindness and cooperation.

Best Regards,
Kazuaki Ishizaki




From:        
Takeshi Yamamuro <[hidden email]>
To:        
Spark dev list <[hidden email]>
Cc:        
[hidden email]
Date:        
2018/10/15 12:12
Subject:        
Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code




Hi, ishizaki-san,

Cool activity, I left some comments on the doc.

best,
takeshi


On Mon, Oct 15, 2018 at 12:05 AM Kazuaki Ishizaki <
[hidden email]> wrote:
Hello community,

I am writing this e-mail in order to start a discussion about adding structure intermediate representation for generating Java code from a program using DataFrame or Dataset API, in addition to the current String-based representation.
This addition is based on the discussions in a thread at
https://github.com/apache/spark/pull/21537#issuecomment-413268196

Please feel free to comment on the JIRA ticket or Google Doc.

JIRA ticket:
https://issues.apache.org/jira/browse/SPARK-25728
Google Doc:
https://docs.google.com/document/d/1Jzf56bxpMpSwsGV_hSzl9wQG22hyI731McQcjognqxY/edit?usp=sharing

Looking forward to hear your feedback

Best Regards,
Kazuaki Ishizaki



--
---
Takeshi Yamamuro



--