Spark-3.0 - performance degradation

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark-3.0 - performance degradation

prudenko
Facing performance degradation for RDD shuffle jobs in Spark-3.0.
Environment:
Spark-3.0: build from commit ba4212660305c6555ae16b10c6bbaf6114c4d830
Spark-2.4.2: release (just to use scala-2.12, results are the same for spark-2.4.5)
Dataset of size 1800Gb, 20 executors, 25 cores per executor:

3.0 results:
image.png
2.4.2:
image.png
Event timeline for 3.0 looks very weird:

image.png
Compared to 2.4:
image.png
Everything with default settings. Run several different workloads of different sizes, with different executors number, but result is the same. Seems like some scheduling issue in 3.0.

Does someone facing the same issue?

Thanks,
Peter Rudenko

Reply | Threaded
Open this post in threaded view
|

Re:Spark-3.0 - performance degradation

beliefer
Can you provide configuration information?

At 2020-02-27 03:49:53, "Peter Rudenko" <[hidden email]> wrote:

Facing performance degradation for RDD shuffle jobs in Spark-3.0.
Environment:
Spark-3.0: build from commit ba4212660305c6555ae16b10c6bbaf6114c4d830
Spark-2.4.2: release (just to use scala-2.12, results are the same for spark-2.4.5)
Dataset of size 1800Gb, 20 executors, 25 cores per executor:

3.0 results:
image.png
2.4.2:
image.png
Event timeline for 3.0 looks very weird:

image.png
Compared to 2.4:
image.png
Everything with default settings. Run several different workloads of different sizes, with different executors number, but result is the same. Seems like some scheduling issue in 3.0.

Does someone facing the same issue?

Thanks,
Peter Rudenko



 

Reply | Threaded
Open this post in threaded view
|

Re: Spark-3.0 - performance degradation

beliefer
In reply to this post by prudenko
Can you provide configuration information?



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark-3.0 - performance degradation

beliefer
In reply to this post by prudenko
Can you provide configuration information?



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark-3.0 - performance degradation

beliefer
In reply to this post by prudenko
Can you show the running configuration information?



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark-3.0 - performance degradation

beliefer
In reply to this post by prudenko
I test it and cannot reproduce the issue.
I build Spark-3.1.0 and Spark2.3.1.
After many tests, it is found that there is little difference between them,
and they win and lose each other.
And from the view of event timeline, Spark-3.1.0 looks more accurate.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]