Revisiting Python / pandas UDF (new proposal)

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Revisiting Python / pandas UDF (new proposal)

Hyukjin Kwon
Hi all,

I happen to come up with another idea about pandas redesign.
Thanks Reynold, Bryan, Xiangrui, Takuya and Tim for offline discussions and
helping me to write this proposal.

Please take a look and let me know what you guys think.

- https://docs.google.com/document/d/1-kV0FS_LF2zvaRh_GhkV32Uqksm_Sq8SvnBBmRyxm30/edit?usp=sharing
https://issues.apache.org/jira/browse/SPARK-28264

I know it's a holiday season but please have some time to take a look so
we can make it on time before code freeze (31st Jan).

Reply | Threaded
Open this post in threaded view
|

Re: Revisiting Python / pandas UDF (new proposal)

Hyukjin Kwon
Thanks for comments Maciej - I am addressing them.
adding Li Jin too.

I plan to proceed this late this week or early next week to make it on time before code freeze.
I am going to pretty actively respond so please give feedback if there's any :-).



2019년 12월 30일 (월) 오후 6:45, Hyukjin Kwon <[hidden email]>님이 작성:
Hi all,

I happen to come up with another idea about pandas redesign.
Thanks Reynold, Bryan, Xiangrui, Takuya and Tim for offline discussions and
helping me to write this proposal.

Please take a look and let me know what you guys think.

- https://docs.google.com/document/d/1-kV0FS_LF2zvaRh_GhkV32Uqksm_Sq8SvnBBmRyxm30/edit?usp=sharing
https://issues.apache.org/jira/browse/SPARK-28264

I know it's a holiday season but please have some time to take a look so
we can make it on time before code freeze (31st Jan).

Reply | Threaded
Open this post in threaded view
|

Re: Revisiting Python / pandas UDF (new proposal)

Li Jin
I am going to review this carefully today. Thanks for the work!

Li

On Wed, Jan 1, 2020 at 10:34 PM Hyukjin Kwon <[hidden email]> wrote:
Thanks for comments Maciej - I am addressing them.
adding Li Jin too.

I plan to proceed this late this week or early next week to make it on time before code freeze.
I am going to pretty actively respond so please give feedback if there's any :-).



2019년 12월 30일 (월) 오후 6:45, Hyukjin Kwon <[hidden email]>님이 작성:
Hi all,

I happen to come up with another idea about pandas redesign.
Thanks Reynold, Bryan, Xiangrui, Takuya and Tim for offline discussions and
helping me to write this proposal.

Please take a look and let me know what you guys think.

- https://docs.google.com/document/d/1-kV0FS_LF2zvaRh_GhkV32Uqksm_Sq8SvnBBmRyxm30/edit?usp=sharing
https://issues.apache.org/jira/browse/SPARK-28264

I know it's a holiday season but please have some time to take a look so
we can make it on time before code freeze (31st Jan).

Reply | Threaded
Open this post in threaded view
|

Re: Revisiting Python / pandas UDF (new proposal)

Li Jin
Hyukjin,

Thanks for putting this together. I took a look at the proposal and left some comments. At the high level I like using type hints to specify input/output types but not so use about type hints for cordiality. I have commented on more details in the doc.

Li

On Thu, Jan 2, 2020 at 9:42 AM Li Jin <[hidden email]> wrote:
I am going to review this carefully today. Thanks for the work!

Li

On Wed, Jan 1, 2020 at 10:34 PM Hyukjin Kwon <[hidden email]> wrote:
Thanks for comments Maciej - I am addressing them.
adding Li Jin too.

I plan to proceed this late this week or early next week to make it on time before code freeze.
I am going to pretty actively respond so please give feedback if there's any :-).



2019년 12월 30일 (월) 오후 6:45, Hyukjin Kwon <[hidden email]>님이 작성:
Hi all,

I happen to come up with another idea about pandas redesign.
Thanks Reynold, Bryan, Xiangrui, Takuya and Tim for offline discussions and
helping me to write this proposal.

Please take a look and let me know what you guys think.

- https://docs.google.com/document/d/1-kV0FS_LF2zvaRh_GhkV32Uqksm_Sq8SvnBBmRyxm30/edit?usp=sharing
https://issues.apache.org/jira/browse/SPARK-28264

I know it's a holiday season but please have some time to take a look so
we can make it on time before code freeze (31st Jan).

Reply | Threaded
Open this post in threaded view
|

Re: Revisiting Python / pandas UDF (new proposal)

Hyukjin Kwon
I happened to propose a somewhat big refactoring PR as a preparation for this.
Basically, grouping all related codes into one sub-package since currently all pandas and PyArrow related codes are here and there.
I would appreciate if you guys can review and give some feedback.

https://github.com/apache/spark/pull/27109

Thanks!


2020년 1월 4일 (토) 오전 5:11, Li Jin <[hidden email]>님이 작성:
Hyukjin,

Thanks for putting this together. I took a look at the proposal and left some comments. At the high level I like using type hints to specify input/output types but not so use about type hints for cordiality. I have commented on more details in the doc.

Li

On Thu, Jan 2, 2020 at 9:42 AM Li Jin <[hidden email]> wrote:
I am going to review this carefully today. Thanks for the work!

Li

On Wed, Jan 1, 2020 at 10:34 PM Hyukjin Kwon <[hidden email]> wrote:
Thanks for comments Maciej - I am addressing them.
adding Li Jin too.

I plan to proceed this late this week or early next week to make it on time before code freeze.
I am going to pretty actively respond so please give feedback if there's any :-).



2019년 12월 30일 (월) 오후 6:45, Hyukjin Kwon <[hidden email]>님이 작성:
Hi all,

I happen to come up with another idea about pandas redesign.
Thanks Reynold, Bryan, Xiangrui, Takuya and Tim for offline discussions and
helping me to write this proposal.

Please take a look and let me know what you guys think.

- https://docs.google.com/document/d/1-kV0FS_LF2zvaRh_GhkV32Uqksm_Sq8SvnBBmRyxm30/edit?usp=sharing
https://issues.apache.org/jira/browse/SPARK-28264

I know it's a holiday season but please have some time to take a look so
we can make it on time before code freeze (31st Jan).

Reply | Threaded
Open this post in threaded view
|

Re: Revisiting Python / pandas UDF (new proposal)

Hyukjin Kwon
Hi all, I made a PR - https://github.com/apache/spark/pull/27165
Please have a look when you guys fine some times.

I addressed another point (by Maciej), "A couple of less-intuitive pandas UDF types" together because
the more I look, the more I felt I should deal with it together with the proposal.
 

2020년 1월 6일 (월) 오후 10:52, Hyukjin Kwon <[hidden email]>님이 작성:
I happened to propose a somewhat big refactoring PR as a preparation for this.
Basically, grouping all related codes into one sub-package since currently all pandas and PyArrow related codes are here and there.
I would appreciate if you guys can review and give some feedback.

https://github.com/apache/spark/pull/27109

Thanks!


2020년 1월 4일 (토) 오전 5:11, Li Jin <[hidden email]>님이 작성:
Hyukjin,

Thanks for putting this together. I took a look at the proposal and left some comments. At the high level I like using type hints to specify input/output types but not so use about type hints for cordiality. I have commented on more details in the doc.

Li

On Thu, Jan 2, 2020 at 9:42 AM Li Jin <[hidden email]> wrote:
I am going to review this carefully today. Thanks for the work!

Li

On Wed, Jan 1, 2020 at 10:34 PM Hyukjin Kwon <[hidden email]> wrote:
Thanks for comments Maciej - I am addressing them.
adding Li Jin too.

I plan to proceed this late this week or early next week to make it on time before code freeze.
I am going to pretty actively respond so please give feedback if there's any :-).



2019년 12월 30일 (월) 오후 6:45, Hyukjin Kwon <[hidden email]>님이 작성:
Hi all,

I happen to come up with another idea about pandas redesign.
Thanks Reynold, Bryan, Xiangrui, Takuya and Tim for offline discussions and
helping me to write this proposal.

Please take a look and let me know what you guys think.

- https://docs.google.com/document/d/1-kV0FS_LF2zvaRh_GhkV32Uqksm_Sq8SvnBBmRyxm30/edit?usp=sharing
https://issues.apache.org/jira/browse/SPARK-28264

I know it's a holiday season but please have some time to take a look so
we can make it on time before code freeze (31st Jan).