Public v2 interface location

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Public v2 interface location

Ryan Blue

Hi everyone,

In the DSv2 sync this week, we discussed adding a new SQL module, sql-api, that would contain the interfaces for authors to plug in external sources. The rationale for adding this package is that the common logical plans and rules to validate those plans should live in Catalyst, but no classes in catalyst are currently public for plugin authors to extend. Catalyst would depend on the sql-api module to pull in the interfaces that plugin authors implement.

I was just working on moving the proposed TableCatalog interface into a new sql-api module, but I ran into a problem: the new APIs still need to reference classes in Catalyst, like DataType/StructType, AnalysisException, and Statistics.

I don’t think it makes sense to move all of the referenced classes into sql-api as well, but I could be convinced otherwise. If we decide not to move them, then that leaves us back where we started: we can either expose the v2 API from the catalyst package, or we can keep the v2 API, logical plans, and rules in core instead of catalyst.

Anyone want to weigh in with a preference for how to move forward?

rb

--
Ryan Blue
Software Engineer
Netflix
Reply | Threaded
Open this post in threaded view
|

Re: Public v2 interface location

JackyLee
Hi, Ryan Blue.

I don't think it would be a good idea to add the sql-api module.
I prefer to add sql-api to sql/core. The sql is just another representation
of dataset, thus there is no need to add new module to do this. Besides, it
would be easier to add sql-api in core.

By the way, I don't think it's a good time to add sql api, we have not yet
determined many details of the DataSource V2 API.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Public v2 interface location

Ryan Blue
Jackey,

The proposal to add a sql-api module was based on the need to have the SQL API classes, like `Table` available to Catalyst so we can have logical plans and analyzer rules in that module. But, nothing in Catalyst is public and so it doesn't contain user-implemented APIs. There are 3 options to solve that problem:

1. Add a module that catalyst depends on with the APIs, sql-api. But I ran into the problem I described above: needing to depend on Catalyst classes.
2. Add the API to catalyst. The problem is adding publicly available API classes to a previously non-public module.
3. Add the API to core. The problem here is that it is more difficult to keep rules and logical plans in catalyst, where I would expect them to be.

I'm not sure which option is the right one, but I no longer think that option #1 is very promising.

On Fri, Nov 30, 2018 at 10:47 PM JackyLee <[hidden email]> wrote:
Hi, Ryan Blue.

I don't think it would be a good idea to add the sql-api module.
I prefer to add sql-api to sql/core. The sql is just another representation
of dataset, thus there is no need to add new module to do this. Besides, it
would be easier to add sql-api in core.

By the way, I don't think it's a good time to add sql api, we have not yet
determined many details of the DataSource V2 API.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



--
Ryan Blue
Software Engineer
Netflix