Datasource V2 support in Spark 3.x

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Datasource V2 support in Spark 3.x

Mihir Sahu
Hi Team,

    Wanted to know ahead of developing new datasource for Spark 3.x. Shall it be done using Datasource V2 or Datasource V1(via Relation) or there is any other plan.

    When I tried to build datasource using V2 for Spark 3.0, I could not find the associated classes and they seems to be moved out, however I am able to use DatasourceV1 to build the new datasources.

     Wanted to know the path ahead for Datasource, so that I can build and contribute accordingly.

Thanks
Regards
Mihir Sahu
Reply | Threaded
Open this post in threaded view
|

Re: Datasource V2 support in Spark 3.x

cloud0fan
Data Source V2 has evolved to Connector API which supports both data (the data source API)  and metadata (the catalog API). The new APIs are under package org.apache.spark.sql.connector

You can keep using Data Source V1 as there is no plan to deprecate it in the near future. But if you'd like to try something new (like integrate with your metadata), please take a look at the new Connector API.

Note that, it's still evolving and API changes may happen in the next release. We hope to stabilize it soon, but are still working on some designs like a stable API to represent data (currently we are using InternalRow).

On Sat, Feb 29, 2020 at 8:39 AM Mihir Sahu <[hidden email]> wrote:
Hi Team,

    Wanted to know ahead of developing new datasource for Spark 3.x. Shall it be done using Datasource V2 or Datasource V1(via Relation) or there is any other plan.

    When I tried to build datasource using V2 for Spark 3.0, I could not find the associated classes and they seems to be moved out, however I am able to use DatasourceV1 to build the new datasources.

     Wanted to know the path ahead for Datasource, so that I can build and contribute accordingly.

Thanks
Regards
Mihir Sahu