Adaptive Query Execution performance results in 3TB TPC-DS

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Adaptive Query Execution performance results in 3TB TPC-DS

Jia, Ke A

Hi all,

We have completed the Spark 3.0 Adaptive Query Execution(AQE) performance tests in 3TB TPC-DS on 5 node Cascade Lake cluster. 2 queries bring about more than 1.5x performance and 37 queries bring more than 1.1x performance with AQE.  There is no query has significant performance degradations. The detail performance results and key configurations are shown in here. Based on the performance result, we recommend users to turn on AQE in spark 3.0. If encounter any bug or improvement when enable AQE, please help to file related JIRAs. Thanks.

 

Regards,

Jia Ke

 

Reply | Threaded
Open this post in threaded view
|

Re: Adaptive Query Execution performance results in 3TB TPC-DS

cloud0fan
Thanks for providing the benchmark numbers! The result is very promising and I'm looking forward to seeing more feedback from real-world workloads.

On Wed, Feb 12, 2020 at 3:43 PM Jia, Ke A <[hidden email]> wrote:

Hi all,

We have completed the Spark 3.0 Adaptive Query Execution(AQE) performance tests in 3TB TPC-DS on 5 node Cascade Lake cluster. 2 queries bring about more than 1.5x performance and 37 queries bring more than 1.1x performance with AQE.  There is no query has significant performance degradations. The detail performance results and key configurations are shown in here. Based on the performance result, we recommend users to turn on AQE in spark 3.0. If encounter any bug or improvement when enable AQE, please help to file related JIRAs. Thanks.

 

Regards,

Jia Ke

 

Reply | Threaded
Open this post in threaded view
|

RE: Adaptive Query Execution performance results in 3TB TPC-DS

Jia, Ke A

Hi Amogh,

Thanks for your interest in AQE work.

 

> Were any table stats available for TPC-DS during the runs ?

   We used the default configurations and didn't set special configurations (such as CBO) to collect the table stats both enable and disable AQE. And AQE mainly rely on the runtime statistic for further optimization not table stats. So it seems the effect of table stats may be small to this benchmark tests. Thanks.

 

Regards,

Jia Ke

 

From: Amogh Margoor <[hidden email]>
Sent: Friday, February 14, 2020 5:02 AM
To: Wenchen Fan <[hidden email]>
Cc: Jia, Ke A <[hidden email]>; [hidden email]
Subject: Re: Adaptive Query Execution performance results in 3TB TPC-DS

 

Thanks Jia Ke for the numbers and they look promising.

Were any table stats available for TPC-DS during the runs ?

 

On Thu, Feb 13, 2020 at 4:07 AM Wenchen Fan <[hidden email]> wrote:

Thanks for providing the benchmark numbers! The result is very promising and I'm looking forward to seeing more feedback from real-world workloads.

 

On Wed, Feb 12, 2020 at 3:43 PM Jia, Ke A <[hidden email]> wrote:

Hi all,

We have completed the Spark 3.0 Adaptive Query Execution(AQE) performance tests in 3TB TPC-DS on 5 node Cascade Lake cluster. 2 queries bring about more than 1.5x performance and 37 queries bring more than 1.1x performance with AQE.  There is no query has significant performance degradations. The detail performance results and key configurations are shown in here. Based on the performance result, we recommend users to turn on AQE in spark 3.0. If encounter any bug or improvement when enable AQE, please help to file related JIRAs. Thanks.

 

Regards,

Jia Ke