John: working in resolution. DSv2 resolution is straight-forward, the difficulty is ensuring a smooth transition from v1 to v2.
Ryan: table resolution will also be used for inserts. Once select is done, insert is next.
John: the PR may include insert as well
Add default v2 catalog:
Ryan: A default catalog is needed fro CTAS support when the source is v2
Ryan: A pass-through v2 catalog that uses SessionCatalog should be available as the default
Wenchen: this should have a design doc
Ryan: Agreed. The PR is for early discussion and prototyping.
Bucketed joins: [Ed: I don’t remember much of this, feel free to expand what was said]
Andrew: looks like lots of work to be done for bucketing. Sort removals aren’t done, bucketing with non-bucketed tables still incurs hashing costs.
Ryan: work on support for Hive bucketing appears to have stopped, so it doesn’t look like this is an easy area to improve
Where should join optimization be done?
Andrew will create a prototype PR.
Case sensitivity in catalogs: should catalogs report case sensitivity to Spark?
Ryan: catalogs connect to external systems so Spark can’t impose case sensitivity requirements. A catalog is case sensitive or not and would only be forced to violate Spark’s assumption.
Ryan: requiring a catalog to report whether it is case sensitive doesn’t actually help Spark. If the catalog is case sensitive, then Spark should pass exactly what it received to avoid changing the meaning. If the catalog is case insensitive, then Spark can pass exactly what it received because case is handled in the catalog. So Spark’s behavior doesn’t change.
Russel: not all catalogs are case sensitive or case insensitive. Some are case insensitive unless an identifier is quoted. Quoted parts are case sensitive.
Ryan: So a catalog would not be able to return true or false correctly.
Conclusion: Spark should pass identifiers that it received, without modification.