Push-based shuffle SPIP

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Push-based shuffle SPIP

mshen
We raised this SPIP ticket in
https://issues.apache.org/jira/browse/SPARK-30602 earlier this year.
Since then, we have progressed in multiple fronts, including:

* Our work is published in VLDB 2020. The final version of the paper is
attached in the SPIP ticket.
* We have further enhanced and productionized this work at LinkedIn, and
have enabled production flows adopting the new push-based shuffle mechanism,
with good results.
* We have recently also ported our push-based shuffle changes to OSS Spark
master branch, so other people can potentially try it out. Details of this
branch is in this  doc
<https://docs.google.com/document/d/16yOfI8P_O3V6hx_FnWT22jeDIItgXuXfaDAV0fJDTqQ/edit#>  
* The  SPIP doc
<https://docs.google.com/document/d/1mYzKVZllA5Flw8AtoX7JUcXBOnNIDADWRbJ7GI6Y71Q/edit>  
is also further updated reflecting more recent designs.
* We have also discussed with multiple companies who share similar interest
in this work.

We would like to resume the discussion of this SPIP in the community, and
push for a voting on this.




-----
Min Shen
Staff Software Engineer
LinkedIn
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Min Shen
Staff Software Engineer
LinkedIn
Reply | Threaded
Open this post in threaded view
|

Re: Push-based shuffle SPIP

Mridul Muralidharan
Hi,

  Thanks for sending out the proposal Min !
For the SPIP requirements, I am willing to act as the shepherd for this proposal.

The jira + paper + proposal provides the high level design and implementation details.
The vldb paper discusses the performance gains in detail for the inhouse deployment of push based shuffle.

Would be great to get feedback from our community on this feature; before we go to voting.


Regards,
Mridul



On Mon, Aug 24, 2020 at 4:32 PM mshen <[hidden email]> wrote:
We raised this SPIP ticket in
https://issues.apache.org/jira/browse/SPARK-30602 earlier this year.
Since then, we have progressed in multiple fronts, including:

* Our work is published in VLDB 2020. The final version of the paper is
attached in the SPIP ticket.
* We have further enhanced and productionized this work at LinkedIn, and
have enabled production flows adopting the new push-based shuffle mechanism,
with good results.
* We have recently also ported our push-based shuffle changes to OSS Spark
master branch, so other people can potentially try it out. Details of this
branch is in this  doc
<https://docs.google.com/document/d/16yOfI8P_O3V6hx_FnWT22jeDIItgXuXfaDAV0fJDTqQ/edit#
* The  SPIP doc
<https://docs.google.com/document/d/1mYzKVZllA5Flw8AtoX7JUcXBOnNIDADWRbJ7GI6Y71Q/edit
is also further updated reflecting more recent designs.
* We have also discussed with multiple companies who share similar interest
in this work.

We would like to resume the discussion of this SPIP in the community, and
push for a voting on this.




-----
Min Shen
Staff Software Engineer
LinkedIn
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Push-based shuffle SPIP

mshen
In reply to this post by mshen
The linked doc with detailed information of the branch does not seem to be
shareable publicly.
We have created a copy of the doc which should be publicly accessible.
https://docs.google.com/document/d/1Q5m7YAp0HyG_TNFL4p_bjQgzzw33ik5i49Vr86UNZgg/edit?usp=sharing



-----
Min Shen
Staff Software Engineer
LinkedIn
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Min Shen
Staff Software Engineer
LinkedIn