

Hi All,
Current Spark CBO implements a cost based multiway join
reordering algorithm based on the SystemR’s paper [Access
PathSIGMOD’79]. When building mway joins, it uses a bottomup
approach and put all items (basic joined nodes) into level 0, then build all
twoway joins at level 1 from plans at level 0 (single items), then build all
3way joins ... etc. The algorithm also considers all
combinations including leftdeep trees, bushy trees, and rightdeeptrees. It also
prunes cartesian product candidates.
While we still found many limitations of current CBO implementation: 1. The current CBO is a rule in logic phase, it only outputs one logical plan to physical phase optimize, while we cannot make sure the best plan in logical phase is still the best after physical optimize.
2. In current bottomup approach, we keeps only one best plan for each level, while we cannot make sure to get the exact best plan for all from the best plan for each level.
3. Current cost formula cost
= weight * cardinality + (1.0  weight) * size from which the first portion roughly corresponds to the CPU cost and the second portion roughly corresponds
to the I/O cost. The cost formula is over simplified and. It treats all the join implementations the same way and doesn't take shuffle and sort cost into
consideration, while Shuffle Exchange is one of the heaviest physical
operator in Spark SQL.
4. Equivalent join conditions are not supported. For example, (A join B join C on a=b and b=c) can be reordered to (A join C join B on a=c and c=b) which is possible to be more efficient. While in current implementation, we will not get condition "a=c" so will take "A join C" like a Cartesian Product and then exclude it.
The bottomup approach first came up from the SystemR
optimizer (1979). It quickly became a standard and many of the modern
relation database optimizers are “SystemR style”, for example, Oracle,
PostgreSQL, MySQL, DB2.
As time goes by, new styles optimizer were invented:
Volcano(1993) and Cascades(1995). They are not that famous compared to SystemR
but still be wildly used in practice: Microsoft SQL Server, Greenplum Orca,
Apache Calcite. They implement Topdown transformational search algorithm and
provide extensible optimization framework.
A topdown optimization framework can help us solve the above limitations since it has a more complete search space and combines the logical and physical phases to have a more accurate cost estimation. And about the efficiency of having all alternatives plans, Cascades also provides pruning to save the search space.
What about implementing a new Cascades style CBO for Spark SQL? It could be a new rule in current "Planner" which reads a logical plan after heuristics rules and outputs a best physical plan with least cost after reorder and physical implementation rules.


Hi, Xiaoju,
Thanks for sending this to the dev list. The current join reordering rule is just a stats based optimizer rule. Either topdown or bottomup optimization can achieve the samelevel optimized plans. DB2 is using bottom up. In the future, we plan to move the stats based join reordering rule to the costbased planner, which is the right place of this rule based on the original design of Spark SQL.
Actually, building a good cost model is much more difficult than implementing such a classic framework, especially when Spark does not own the data. Also, we need to compute incremental stats instead of always recomputing the stats.
Cheers,
Xiao
Hi All,
Current Spark CBO implements a cost based multiway join
reordering algorithm based on the SystemR’s paper [Access
PathSIGMOD’79]. When building mway joins, it uses a bottomup
approach and put all items (basic joined nodes) into level 0, then build all
twoway joins at level 1 from plans at level 0 (single items), then build all
3way joins ... etc. The algorithm also considers all
combinations including leftdeep trees, bushy trees, and rightdeeptrees. It also
prunes cartesian product candidates.
While we still found many limitations of current CBO implementation: 1. The current CBO is a rule in logic phase, it only outputs one logical plan to physical phase optimize, while we cannot make sure the best plan in logical phase is still the best after physical optimize.
2. In current bottomup approach, we keeps only one best plan for each level, while we cannot make sure to get the exact best plan for all from the best plan for each level.
3. Current cost formula cost
= weight * cardinality + (1.0  weight) * size from which the first portion roughly corresponds to the CPU cost and the second portion roughly corresponds
to the I/O cost. The cost formula is over simplified and. It treats all the join implementations the same way and doesn't take shuffle and sort cost into
consideration, while Shuffle Exchange is one of the heaviest physical
operator in Spark SQL.
4. Equivalent join conditions are not supported. For example, (A join B join C on a=b and b=c) can be reordered to (A join C join B on a=c and c=b) which is possible to be more efficient. While in current implementation, we will not get condition "a=c" so will take "A join C" like a Cartesian Product and then exclude it.
The bottomup approach first came up from the SystemR
optimizer (1979). It quickly became a standard and many of the modern
relation database optimizers are “SystemR style”, for example, Oracle,
PostgreSQL, MySQL, DB2.
As time goes by, new styles optimizer were invented:
Volcano(1993) and Cascades(1995). They are not that famous compared to SystemR
but still be wildly used in practice: Microsoft SQL Server, Greenplum Orca,
Apache Calcite. They implement Topdown transformational search algorithm and
provide extensible optimization framework.
A topdown optimization framework can help us solve the above limitations since it has a more complete search space and combines the logical and physical phases to have a more accurate cost estimation. And about the efficiency of having all alternatives plans, Cascades also provides pruning to save the search space.
What about implementing a new Cascades style CBO for Spark SQL? It could be a new rule in current "Planner" which reads a logical plan after heuristics rules and outputs a best physical plan with least cost after reorder and physical implementation rules.


Hi All,
Takeshi Yamamuro gave some comments on this topic on twitter. And after more research, here are correction and updates of my understanding about bottomup and topdown now.
Bottomup and topdown are just 2 strategies to enumerate join order and generate the search space. Both of them can get the same best plan if the statistics and cost model are the same. (While the best plan is "best" theoretically since the cost model is an estimation of the performance and workload of environment, and join selectivity is also based on estimation).
While the performance of the optimizer depends not on bottomup or topdown but on the algorithms used in bottomup or topdown framework.
Both bottomup and topdown have many algorithms and some are similar and compete with each other.
Bottomup: DPsize, DPsub, DPccp (Papers: "Analysis of Two Existing and One New Dynamic
Programming Algorithm for the Generation of Optimal
Bushy Join Trees without Cross Products" ) TopDown: Basic algorithms based on Commutativity and Associativity: RSB1, RSB2 Graph algorithms: TDMinCutLazy, TDMinCutBranch, TDMinCutConservative...... (Papers:"Optimal TopDown Join Enumeration", "Optimizing Join Enumeration in Transformationbased
Query Optimizers", "Effective and Robust Pruning for TopDown Join
Enumeration Algorithms")
Before the graph algorithms for topdown, it's known that bottomup is more efficient especially for CartesianProductfree search space. While topdown has the capability to do pruning. And after graph algorithms for topdown, it can also be CPfree. More details is provided in this paper "Optimizing Join Enumeration in Transformationbased Query Optimizers".
In conclusion, my suggestion is to implement a Cascades like topdown optimizer which is based on best graph algorithm, CPfree and pruning enabled. Also a good cost model is provided which is based on physical implementations, for example, hashjoin and sortmergejoin have the same input and output, but the time spent on reading, computing and copying are different. Details can be similar with what is done in Hive( https://cwiki.apache.org/confluence/display/Hive/Costbased+optimization+in+Hive)
Hi Xiao
Quite agree with you that a good cost model is important, instead of current stats based cost.
While I think the bottomup framework itself has limitation since it only keeps one best plan of each level. But it doesn't exactly mean the best plan of the final level. If you want to get the exact best plan of all in current bottomup framework, you need to enumerate all alternative plans and compare the costs of them.
Volcano/Cascades framework provides a more efficient solution which is already used in Calcite, Greenplum, SQL Server....
So I think both framework and cost model are important.
We are now working on a Cascades POC, also considering about a new cost model. We want to know if the community is interested in this feature. If yes, we can share more detailed design and discuss with you.
Hi, Xiaoju,
Thanks for sending this to the dev list. The current join reordering rule is just a stats based optimizer rule. Either topdown or bottomup optimization can achieve the samelevel optimized plans. DB2 is using bottom up. In the future, we plan to move the stats based join reordering rule to the costbased planner, which is the right place of this rule based on the original design of Spark SQL.
Actually, building a good cost model is much more difficult than implementing such a classic framework, especially when Spark does not own the data. Also, we need to compute incremental stats instead of always recomputing the stats.
Cheers,
Xiao
Hi All,
Current Spark CBO implements a cost based multiway join
reordering algorithm based on the SystemR’s paper [Access
PathSIGMOD’79]. When building mway joins, it uses a bottomup
approach and put all items (basic joined nodes) into level 0, then build all
twoway joins at level 1 from plans at level 0 (single items), then build all
3way joins ... etc. The algorithm also considers all
combinations including leftdeep trees, bushy trees, and rightdeeptrees. It also
prunes cartesian product candidates.
While we still found many limitations of current CBO implementation: 1. The current CBO is a rule in logic phase, it only outputs one logical plan to physical phase optimize, while we cannot make sure the best plan in logical phase is still the best after physical optimize.
2. In current bottomup approach, we keeps only one best plan for each level, while we cannot make sure to get the exact best plan for all from the best plan for each level.
3. Current cost formula cost
= weight * cardinality + (1.0  weight) * size from which the first portion roughly corresponds to the CPU cost and the second portion roughly corresponds
to the I/O cost. The cost formula is over simplified and. It treats all the join implementations the same way and doesn't take shuffle and sort cost into
consideration, while Shuffle Exchange is one of the heaviest physical
operator in Spark SQL.
4. Equivalent join conditions are not supported. For example, (A join B join C on a=b and b=c) can be reordered to (A join C join B on a=c and c=b) which is possible to be more efficient. While in current implementation, we will not get condition "a=c" so will take "A join C" like a Cartesian Product and then exclude it.
The bottomup approach first came up from the SystemR
optimizer (1979). It quickly became a standard and many of the modern
relation database optimizers are “SystemR style”, for example, Oracle,
PostgreSQL, MySQL, DB2.
As time goes by, new styles optimizer were invented:
Volcano(1993) and Cascades(1995). They are not that famous compared to SystemR
but still be wildly used in practice: Microsoft SQL Server, Greenplum Orca,
Apache Calcite. They implement Topdown transformational search algorithm and
provide extensible optimization framework.
A topdown optimization framework can help us solve the above limitations since it has a more complete search space and combines the logical and physical phases to have a more accurate cost estimation. And about the efficiency of having all alternatives plans, Cascades also provides pruning to save the search space.
What about implementing a new Cascades style CBO for Spark SQL? It could be a new rule in current "Planner" which reads a logical plan after heuristics rules and outputs a best physical plan with least cost after reorder and physical implementation rules.

