Arrow optimization in conversion from R DataFrame to Spark DataFrame

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Arrow optimization in conversion from R DataFrame to Spark DataFrame

Hyukjin Kwon
Hi all,

I am trying to introduce R Arrow optimization by reusing PySpark Arrow optimization.

It boosts R DataFrame > Spark DataFrame up to roughly 900% ~ 1200% faster.

Looks working fine so far; however, I would appreciate if you guys have some time to take a look (https://github.com/apache/spark/pull/22954) so that we can directly go ahead as soon as R API of Arrow is released.

More importantly, I want some more people who're more into Arrow R API side but also interested in Spark side. I have already cc'ed some people I know but please come, review and discuss for both Spark side and Arrow side.

Thanks.

Reply | Threaded
Open this post in threaded view
|

Re: Arrow optimization in conversion from R DataFrame to Spark DataFrame

Felix Cheung
Very cool!

 

From: Hyukjin Kwon <[hidden email]>
Sent: Thursday, November 8, 2018 10:29 AM
To: dev
Subject: Arrow optimization in conversion from R DataFrame to Spark DataFrame
 
Hi all,

I am trying to introduce R Arrow optimization by reusing PySpark Arrow optimization.

It boosts R DataFrame > Spark DataFrame up to roughly 900% ~ 1200% faster.

Looks working fine so far; however, I would appreciate if you guys have some time to take a look (https://github.com/apache/spark/pull/22954) so that we can directly go ahead as soon as R API of Arrow is released.

More importantly, I want some more people who're more into Arrow R API side but also interested in Spark side. I have already cc'ed some people I know but please come, review and discuss for both Spark side and Arrow side.

Thanks.

Reply | Threaded
Open this post in threaded view
|

Re: Arrow optimization in conversion from R DataFrame to Spark DataFrame

Shivaram Venkataraman
Thanks Hyukjin! Very cool results

Shivaram
On Fri, Nov 9, 2018 at 10:58 AM Felix Cheung <[hidden email]> wrote:

>
> Very cool!
>
>
> ________________________________
> From: Hyukjin Kwon <[hidden email]>
> Sent: Thursday, November 8, 2018 10:29 AM
> To: dev
> Subject: Arrow optimization in conversion from R DataFrame to Spark DataFrame
>
> Hi all,
>
> I am trying to introduce R Arrow optimization by reusing PySpark Arrow optimization.
>
> It boosts R DataFrame > Spark DataFrame up to roughly 900% ~ 1200% faster.
>
> Looks working fine so far; however, I would appreciate if you guys have some time to take a look (https://github.com/apache/spark/pull/22954) so that we can directly go ahead as soon as R API of Arrow is released.
>
> More importantly, I want some more people who're more into Arrow R API side but also interested in Spark side. I have already cc'ed some people I know but please come, review and discuss for both Spark side and Arrow side.
>
> Thanks.
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Arrow optimization in conversion from R DataFrame to Spark DataFrame

Bryan Cutler
Great work Hyukjin!  I'm not too familiar with R, but I'll take a look at the PR.

Bryan

On Fri, Nov 9, 2018 at 9:19 AM Shivaram Venkataraman <[hidden email]> wrote:
Thanks Hyukjin! Very cool results

Shivaram
On Fri, Nov 9, 2018 at 10:58 AM Felix Cheung <[hidden email]> wrote:
>
> Very cool!
>
>
> ________________________________
> From: Hyukjin Kwon <[hidden email]>
> Sent: Thursday, November 8, 2018 10:29 AM
> To: dev
> Subject: Arrow optimization in conversion from R DataFrame to Spark DataFrame
>
> Hi all,
>
> I am trying to introduce R Arrow optimization by reusing PySpark Arrow optimization.
>
> It boosts R DataFrame > Spark DataFrame up to roughly 900% ~ 1200% faster.
>
> Looks working fine so far; however, I would appreciate if you guys have some time to take a look (https://github.com/apache/spark/pull/22954) so that we can directly go ahead as soon as R API of Arrow is released.
>
> More importantly, I want some more people who're more into Arrow R API side but also interested in Spark side. I have already cc'ed some people I know but please come, review and discuss for both Spark side and Arrow side.
>
> Thanks.
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Arrow optimization in conversion from R DataFrame to Spark DataFrame

Hyukjin Kwon
Thanks guys ! 👍

2018년 11월 10일 (토) 오전 7:35, Bryan Cutler <[hidden email]>님이 작성:
Great work Hyukjin!  I'm not too familiar with R, but I'll take a look at the PR.

Bryan

On Fri, Nov 9, 2018 at 9:19 AM Shivaram Venkataraman <[hidden email]> wrote:
Thanks Hyukjin! Very cool results

Shivaram
On Fri, Nov 9, 2018 at 10:58 AM Felix Cheung <[hidden email]> wrote:
>
> Very cool!
>
>
> ________________________________
> From: Hyukjin Kwon <[hidden email]>
> Sent: Thursday, November 8, 2018 10:29 AM
> To: dev
> Subject: Arrow optimization in conversion from R DataFrame to Spark DataFrame
>
> Hi all,
>
> I am trying to introduce R Arrow optimization by reusing PySpark Arrow optimization.
>
> It boosts R DataFrame > Spark DataFrame up to roughly 900% ~ 1200% faster.
>
> Looks working fine so far; however, I would appreciate if you guys have some time to take a look (https://github.com/apache/spark/pull/22954) so that we can directly go ahead as soon as R API of Arrow is released.
>
> More importantly, I want some more people who're more into Arrow R API side but also interested in Spark side. I have already cc'ed some people I know but please come, review and discuss for both Spark side and Arrow side.
>
> Thanks.
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]