Remove SaveMode from file sources: Blocked by TableProvider/CatalogPlugin changes. Doesn’t work with all of the using clauses from v1, like JDBC. Working on a CatalogPlugin fix.
Reorganize packages: Blocked by outstanding INSERT INTO PRs
Docs: Ryan: docs can be written after branching, so focus should be on stability right now
Any other blockers? Please send them to Ryan to track
V2 session catalog config PR:
Wenchen: this will be included in CatalogPlugin changes
DESCRIBE TABLE PR:
Matt: waiting for review
Burak: partitioning is strange, uses “Part 0” instead of names
Ryan: there are no names for transform partitions (identity partitions use column names)
Conclusion: not a big problem since there is no required schema, we can update later if better ideas come up
INSERT INTO PR:
Ryan: ready for another review, DataFrameWriter.insertInto PR will follow
Ryan: ready for another review
SHOW TABLES PR:
Terry: there are open questions: what is the current database for v2?
Ryan: there should be a current namespace in the SessionState. This could be per catalog?
Conclusion: do not track current namespace per catalog. Reset to a catalog default when current catalog changes
Ryan: will add SupportsNamespace method for default namespace to initialize current.
Burak: USE foo.bar could set both
What is SupportsNamespaces is not implemented? Default to Seq.empty
Terry: should listing methods support search patterns?
Ryan: this adds complexity that should be handled by Spark instead of complicating the API. There isn’t a performance need to push this down because we don’t expect high cardinality for a namespace level.
Conclusion: implement in SHOW TABLES exec
Terry: how should temporary tables be handled?
Wenchen: temporary table is an alias for temporary view. SHOW TABLES does list temporary views, v2 should implement the same behavior.
Terry: support EXTENDED?
Ryan: This can be done later.
DELETE FROM PR:
Wenchen: DELETE FROM just passes filters to the data source to delete
Ryan: Instead of a complicated builder, let’s solve just the simple case (filters) and not the row-level delete case. If we do that, then we can use a simple SupportsDelete interface and put off row-level delete design
Consensus was to add a SupportsDelete interface for Table and not a new builder
Stats push-down fix:
Ryan: briefly looked into it and this can probably be done earlier, in the optimizer by creating a scan early and a special logical plan to wrap a scan. This isn’t a good long-term solution but would fix stats for the release. Write side would not change.
Ryan will submit a PR with the implementation
Using ALTER TABLE implementations for v1
Burak: Took a stab at this, but ran into problems. Would be nice if all DDL for v1 were supported through v2 API
DDL doesn’t work with v1 for custom data sources - if the source of truth is not Hive
Matt: v2 should be used to change the source of truth. v1 behavior is to only change the session catalog (e.g., Hive).
Matt: is v1 deprecated?
Wenchen, not until stable
Burak: can’t deprecate yet
Burak: CTAS and RTAS could also call v1
Ryan: We could build a v2 implementation that calls v1, but only append and read could be supported because v1 overwrite behavior is unreliable across sources.
Ran out of time
Wenchen’s CatalogPlugin changes can be discussed next time
Ryan will follow up with Raymond about reusing Parquet read path in other v2 sources