[SPARK SQL] Make max multi table join limit configurable in OptimizeSkewedJoin
First time contributing, so reaching out by email before creating a JIRA ticket and PR. I would like to propose a small change/enhancement to OptimizeSkewedJoin.
Currently, OptimizeSkewedJoin has a hardcoded limit for multi table joins (limit = 2). For processes that have multiple joins (n > 2) OptimizeSkewedJoin will only be considered for two of the n joins.
Code comment suggests it is currently defaulted to 2 due to too many complex combinations to consider etc, however, it would be good to allow users to override/configure this via the Spark Config, as complexity can be use case dependent.
Add spark.sql.adaptive.skewJoin.maxMultiTableJoin (default = 2) to SQLConf
Update OptimizeSkewedJoin to consider above configuration
If user sets > 2 log a warning to indicate complexity
If people think this is a good idea and useful please let me know and I will proceed.