[RESULT][VOTE] SPIP: Public APIs for extended Columnar Processing Support

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[RESULT][VOTE] SPIP: Public APIs for extended Columnar Processing Support

Thomas graves-2
Hi all,

The vote passed with 9 +1's (4 binding) and 1 +0 and no -1's.

 +1s (* = binding) :
Bobby Evans*
Thomas Graves*
DB Tsai*
Felix Cheung*
Bryan Cutler
Kazuaki Ishizaki
Tyson Condie
Dongjoon Hyun
Jason Lowe

+0s:
Xiangrui Meng

Thanks,
Tom Graves

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [RESULT][VOTE] SPIP: Public APIs for extended Columnar Processing Support

rxin
Thanks Tom.

I finally had time to look at the updated SPIP 10 mins ago. I support the high level idea and +1 on the SPIP.

That said, I think the proposed API is too complicated and invasive change to the existing internals. A much simpler API would be to expose a columnar batch iterator interface, i.e. an uber column oriented UDF with ability to manage life cycle. Once we have that, we can also refactor the existing Python UDFs to use that interface.

As I said earlier (couple months ago when this was first surfaced?), I support the idea to enable *external* column oriented processing logic, but not changing Spark itself to have two processing mode, which is simply very complicated and would create very high maintenance burden for the project.




On Wed, May 29, 2019 at 9:49 PM, Thomas graves <[hidden email]> wrote:

Hi all,

The vote passed with 9 +1's (4 binding) and 1 +0 and no -1's.

+1s (* = binding) :
Bobby Evans*
Thomas Graves*
DB Tsai*
Felix Cheung*
Bryan Cutler
Kazuaki Ishizaki
Tyson Condie
Dongjoon Hyun
Jason Lowe

+0s:
Xiangrui Meng

Thanks,
Tom Graves

--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: [RESULT][VOTE] SPIP: Public APIs for extended Columnar Processing Support

Bobby Evans-2
Let me put up an initial patch probably around the beginning of next week and we can talk about the maintenance involved with it there when you have something more concrete to look at.

Thanks,

Bobby

On Wed, May 29, 2019 at 5:04 PM Reynold Xin <[hidden email]> wrote:
Thanks Tom.

I finally had time to look at the updated SPIP 10 mins ago. I support the high level idea and +1 on the SPIP.

That said, I think the proposed API is too complicated and invasive change to the existing internals. A much simpler API would be to expose a columnar batch iterator interface, i.e. an uber column oriented UDF with ability to manage life cycle. Once we have that, we can also refactor the existing Python UDFs to use that interface.

As I said earlier (couple months ago when this was first surfaced?), I support the idea to enable *external* column oriented processing logic, but not changing Spark itself to have two processing mode, which is simply very complicated and would create very high maintenance burden for the project.




On Wed, May 29, 2019 at 9:49 PM, Thomas graves <[hidden email]> wrote:

Hi all,

The vote passed with 9 +1's (4 binding) and 1 +0 and no -1's.

+1s (* = binding) :
Bobby Evans*
Thomas Graves*
DB Tsai*
Felix Cheung*
Bryan Cutler
Kazuaki Ishizaki
Tyson Condie
Dongjoon Hyun
Jason Lowe

+0s:
Xiangrui Meng

Thanks,
Tom Graves

--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]