[DISCUSS] Cascades style CBO for Spark SQL

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Cascades style CBO for Spark SQL

吴晓菊
Hi All,

Current Spark CBO implements a cost based multi-way join reordering algorithm based on the System-R’s paper [Access Path-SIGMOD’79]. When building m-way joins, it uses a bottom-up approach and put all items (basic joined nodes) into level 0, then build all two-way joins at level 1 from plans at level 0 (single items), then build all 3-way joins ... etc. The algorithm also considers all combinations including left-deep trees, bushy trees, and right-deep-trees. It also prunes cartesian product candidates.

While we still found many limitations of current CBO implementation:
1. The current CBO is a rule in logic phase, it only outputs one logical plan to physical phase optimize, while we cannot make sure the best plan in logical phase is still the best after physical optimize.

2. In current bottom-up approach, we keeps only one best plan for each level, while we cannot make sure to get the exact best plan for all from the best plan for each level.

3. Current cost formula cost = weight * cardinality + (1.0 - weight) * size from which the first portion roughly corresponds to the CPU cost and the second portion roughly corresponds to the I/O cost. The cost formula is over simplified and. It treats all the join implementations the same way and doesn't take shuffle and sort cost into consideration, while Shuffle Exchange is one of the heaviest physical operator in Spark SQL. 

4. Equivalent join conditions are not supported. For example, (A join B join C on a=b and b=c) can be reordered to (A join C join B on a=c and c=b) which is possible to be more efficient. While in current implementation, we will not get condition "a=c" so will take "A join C" like a Cartesian Product and then exclude it.

The bottom-up approach first came up from the System-R optimizer (1979). It quickly became a standard and many of the modern relation database optimizers are “System-R style”, for example, Oracle, PostgreSQL, MySQL, DB2. 

As time goes by, new styles optimizer were invented: Volcano(1993) and Cascades(1995). They are not that famous compared to System-R but still be wildly used in practice: Microsoft SQL Server, Greenplum Orca, Apache Calcite. They implement Top-down transformational search algorithm and provide extensible optimization framework.   

A top-down optimization framework can help us solve the above limitations since it has a more complete search space and combines the logical and physical phases to have a more accurate cost estimation. And about the efficiency of having all alternatives plans, Cascades also provides pruning to save the search space.

What about implementing a new Cascades style CBO for Spark SQL
It could be a new rule in current "Planner" which reads a logical plan after heuristics rules and outputs a best physical plan with least cost after reorder and physical implementation rules. 

Xiaoju Wu
Phone:+86 17717640807

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Cascades style CBO for Spark SQL

Xiao Li
Hi, Xiaoju,

Thanks for sending this to the dev list. The current join reordering rule is just a stats based optimizer rule. Either top-down or bottom-up optimization can achieve the same-level optimized plans. DB2 is using bottom up. In the future, we plan to move the stats based join reordering rule to the cost-based planner, which is the right place of this rule based on the original design of Spark SQL. 

Actually, building a good cost model is much more difficult than implementing such a classic framework, especially when Spark does not own the data. Also, we need to compute incremental stats instead of always recomputing the stats.

Cheers,

Xiao



吴晓菊 <[hidden email]> 于2018年9月24日周一 下午7:53写道:
Hi All,

Current Spark CBO implements a cost based multi-way join reordering algorithm based on the System-R’s paper [Access Path-SIGMOD’79]. When building m-way joins, it uses a bottom-up approach and put all items (basic joined nodes) into level 0, then build all two-way joins at level 1 from plans at level 0 (single items), then build all 3-way joins ... etc. The algorithm also considers all combinations including left-deep trees, bushy trees, and right-deep-trees. It also prunes cartesian product candidates.

While we still found many limitations of current CBO implementation:
1. The current CBO is a rule in logic phase, it only outputs one logical plan to physical phase optimize, while we cannot make sure the best plan in logical phase is still the best after physical optimize.

2. In current bottom-up approach, we keeps only one best plan for each level, while we cannot make sure to get the exact best plan for all from the best plan for each level.

3. Current cost formula cost = weight * cardinality + (1.0 - weight) * size from which the first portion roughly corresponds to the CPU cost and the second portion roughly corresponds to the I/O cost. The cost formula is over simplified and. It treats all the join implementations the same way and doesn't take shuffle and sort cost into consideration, while Shuffle Exchange is one of the heaviest physical operator in Spark SQL. 

4. Equivalent join conditions are not supported. For example, (A join B join C on a=b and b=c) can be reordered to (A join C join B on a=c and c=b) which is possible to be more efficient. While in current implementation, we will not get condition "a=c" so will take "A join C" like a Cartesian Product and then exclude it.

The bottom-up approach first came up from the System-R optimizer (1979). It quickly became a standard and many of the modern relation database optimizers are “System-R style”, for example, Oracle, PostgreSQL, MySQL, DB2. 

As time goes by, new styles optimizer were invented: Volcano(1993) and Cascades(1995). They are not that famous compared to System-R but still be wildly used in practice: Microsoft SQL Server, Greenplum Orca, Apache Calcite. They implement Top-down transformational search algorithm and provide extensible optimization framework.   

A top-down optimization framework can help us solve the above limitations since it has a more complete search space and combines the logical and physical phases to have a more accurate cost estimation. And about the efficiency of having all alternatives plans, Cascades also provides pruning to save the search space.

What about implementing a new Cascades style CBO for Spark SQL
It could be a new rule in current "Planner" which reads a logical plan after heuristics rules and outputs a best physical plan with least cost after reorder and physical implementation rules. 

Xiaoju Wu
Phone:+86 17717640807

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Cascades style CBO for Spark SQL

吴晓菊
Hi All,

Takeshi Yamamuro gave some comments on this topic on twitter. And after more research, here are correction and updates of my understanding about bottom-up and top-down now. 

Bottom-up and top-down are just 2 strategies to enumerate join order and generate the search space. Both of them can get the same best plan if the statistics and cost model are the same. (While the best plan is "best" theoretically since the cost model is an estimation of the performance and workload of environment, and join selectivity is also based on estimation).

While the performance of the optimizer depends not on bottom-up or top-down but on the algorithms used in bottom-up or top-down framework.

Both bottom-up and top-down have many algorithms and some are similar and compete with each other.

Bottom-up:
     DPsize, DPsub, DPccp  
     (Papers: "Analysis of Two Existing and One New Dynamic Programming Algorithm for the Generation of Optimal Bushy Join Trees without Cross Products" )
Top-Down:
     Basic algorithms based on Commutativity and Associativity: RS-B1, RS-B2
     Graph algorithms:  TDMinCutLazy, TDMinCutBranch, TDMinCutConservative......
     (Papers:"Optimal Top-Down Join Enumeration", 
                   "Optimizing Join Enumeration in Transformation-based Query Optimizers",  
                   "Effective and Robust Pruning for Top-Down Join Enumeration Algorithms")

Before the graph algorithms for top-down, it's known that bottom-up is more efficient especially for CartesianProduct-free search space. While top-down has the capability to do pruning. And after graph algorithms for top-down, it can also be CP-free. More details is provided in this paper   "Optimizing Join Enumeration in Transformation-based Query Optimizers".

In conclusion, my suggestion is to implement a Cascades like top-down optimizer which is based on best graph algorithm, CP-free and pruning enabled. Also a good cost model is provided which is based on physical implementations, for example, hashjoin and sortmerge-join have the same input and output, but the time spent on reading, computing and copying are different. Details can be similar with what is done in Hive(https://cwiki.apache.org/confluence/display/Hive/Cost-based+optimization+in+Hive)

[hidden email] @Yamamuro  any comments?

Thanks,
Xiaoju



吴晓菊 <[hidden email]> 于2018年9月26日周三 上午10:39写道:
Hi Xiao

Quite agree with you that a good cost model is important, instead of current stats based cost. 

While I think the bottom-up framework itself has limitation since it only keeps one best plan of each level. But it doesn't exactly mean the best plan of the final level. If you want to get the exact best plan of all in current bottom-up framework, you need to enumerate all alternative plans and compare the costs of them. 

Volcano/Cascades framework provides a more efficient solution which is already used in Calcite, Greenplum, SQL Server....

So I think both framework and cost model are important. 

We are now working on a Cascades POC, also considering about a new cost model. We want to know if the community is interested in this feature. If yes, we can share more detailed design and discuss with you. 

Thanks,
Xiaoju



Xiao Li <[hidden email]> 于2018年9月26日周三 上午8:30写道:
Hi, Xiaoju,

Thanks for sending this to the dev list. The current join reordering rule is just a stats based optimizer rule. Either top-down or bottom-up optimization can achieve the same-level optimized plans. DB2 is using bottom up. In the future, we plan to move the stats based join reordering rule to the cost-based planner, which is the right place of this rule based on the original design of Spark SQL. 

Actually, building a good cost model is much more difficult than implementing such a classic framework, especially when Spark does not own the data. Also, we need to compute incremental stats instead of always recomputing the stats.

Cheers,

Xiao



吴晓菊 <[hidden email]> 于2018年9月24日周一 下午7:53写道:
Hi All,

Current Spark CBO implements a cost based multi-way join reordering algorithm based on the System-R’s paper [Access Path-SIGMOD’79]. When building m-way joins, it uses a bottom-up approach and put all items (basic joined nodes) into level 0, then build all two-way joins at level 1 from plans at level 0 (single items), then build all 3-way joins ... etc. The algorithm also considers all combinations including left-deep trees, bushy trees, and right-deep-trees. It also prunes cartesian product candidates.

While we still found many limitations of current CBO implementation:
1. The current CBO is a rule in logic phase, it only outputs one logical plan to physical phase optimize, while we cannot make sure the best plan in logical phase is still the best after physical optimize.

2. In current bottom-up approach, we keeps only one best plan for each level, while we cannot make sure to get the exact best plan for all from the best plan for each level.

3. Current cost formula cost = weight * cardinality + (1.0 - weight) * size from which the first portion roughly corresponds to the CPU cost and the second portion roughly corresponds to the I/O cost. The cost formula is over simplified and. It treats all the join implementations the same way and doesn't take shuffle and sort cost into consideration, while Shuffle Exchange is one of the heaviest physical operator in Spark SQL. 

4. Equivalent join conditions are not supported. For example, (A join B join C on a=b and b=c) can be reordered to (A join C join B on a=c and c=b) which is possible to be more efficient. While in current implementation, we will not get condition "a=c" so will take "A join C" like a Cartesian Product and then exclude it.

The bottom-up approach first came up from the System-R optimizer (1979). It quickly became a standard and many of the modern relation database optimizers are “System-R style”, for example, Oracle, PostgreSQL, MySQL, DB2. 

As time goes by, new styles optimizer were invented: Volcano(1993) and Cascades(1995). They are not that famous compared to System-R but still be wildly used in practice: Microsoft SQL Server, Greenplum Orca, Apache Calcite. They implement Top-down transformational search algorithm and provide extensible optimization framework.   

A top-down optimization framework can help us solve the above limitations since it has a more complete search space and combines the logical and physical phases to have a more accurate cost estimation. And about the efficiency of having all alternatives plans, Cascades also provides pruning to save the search space.

What about implementing a new Cascades style CBO for Spark SQL
It could be a new rule in current "Planner" which reads a logical plan after heuristics rules and outputs a best physical plan with least cost after reorder and physical implementation rules. 

Xiaoju Wu
Phone:+86 17717640807