Datasource v2 Select Into support

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Datasource v2 Select Into support

Ross Lawley
Hi,

I hope this is the correct mailinglist. I've been adding v2 support to the MongoDB Spark connector using Spark 2.3.1.  I've noticed one of my tests pass when using the original DefaultSource but errors with my v2 implementation:

The code I'm running is:
val df = spark.loadDS[Character]()
df.createOrReplaceTempView("people")
spark.sql("INSERT INTO table people SELECT 'Mort', 1000")

The error I see is:
unresolved operator 'InsertIntoTable DataSourceV2Relation [name#0, age#1], MongoDataSourceReader ...
'InsertIntoTable DataSourceV2Relation [name#0, age#1], MongoDataSourceReader ....
+- Project [Mort AS Mort#7, 1000 AS 1000#8]
   +- OneRowRelation

My DefaultSource V2 implementation extends DataSourceV2 with ReadSupport with ReadSupportWithSchema with WriteSupport

I'm wondering if there is something I'm not implementing, or if there is a bug in my implementation or its an issue with Spark?

Any pointers would be great,

Ross
Reply | Threaded
Open this post in threaded view
|

Re: Datasource v2 Select Into support

cloud0fan
Data source v2 catalog support(table/view) is still in progress. There are several threads in the dev list discussing it, please join the discussion if you are interested. Thanks for trying!

On Thu, Sep 6, 2018 at 7:23 PM Ross Lawley <[hidden email]> wrote:
Hi,

I hope this is the correct mailinglist. I've been adding v2 support to the MongoDB Spark connector using Spark 2.3.1.  I've noticed one of my tests pass when using the original DefaultSource but errors with my v2 implementation:

The code I'm running is:
val df = spark.loadDS[Character]()
df.createOrReplaceTempView("people")
spark.sql("INSERT INTO table people SELECT 'Mort', 1000")

The error I see is:
unresolved operator 'InsertIntoTable DataSourceV2Relation [name#0, age#1], MongoDataSourceReader ...
'InsertIntoTable DataSourceV2Relation [name#0, age#1], MongoDataSourceReader ....
+- Project [Mort AS Mort#7, 1000 AS 1000#8]
   +- OneRowRelation

My DefaultSource V2 implementation extends DataSourceV2 with ReadSupport with ReadSupportWithSchema with WriteSupport

I'm wondering if there is something I'm not implementing, or if there is a bug in my implementation or its an issue with Spark?

Any pointers would be great,

Ross
Reply | Threaded
Open this post in threaded view
|

Re: Datasource v2 Select Into support

Ryan Blue
Ross,

The problem you're hitting is that there aren't many logical plans that work with the v2 source API yet. Here, you're creating an InsertIntoTable logical plan from SQL, which can't be converted to a physical plan because there is no rule to convert it either to the right logical plan for v2, AppendData.

What we need to do next is to get the new logical plans for v2 into Spark, like CreateTableAsSelect and ReplacePartitionsDynamic, and then add analysis rules to convert from the plans created by the SQL parser to those v2 plans. The reason why we're adding new logical plans is to clearly define the behavior of these queries, including the rules that validate data can be written and how the operation is implemented, like creating a table before writing to it for CTAS.

I currently have a working implementation of this, but we're blocked getting it into upstream Spark on the issue to add the TableCatalog API. I'd love to get that in so we can get more of our implementation submitted.

rb

On Thu, Sep 6, 2018 at 6:54 AM Wenchen Fan <[hidden email]> wrote:
Data source v2 catalog support(table/view) is still in progress. There are several threads in the dev list discussing it, please join the discussion if you are interested. Thanks for trying!

On Thu, Sep 6, 2018 at 7:23 PM Ross Lawley <[hidden email]> wrote:
Hi,

I hope this is the correct mailinglist. I've been adding v2 support to the MongoDB Spark connector using Spark 2.3.1.  I've noticed one of my tests pass when using the original DefaultSource but errors with my v2 implementation:

The code I'm running is:
val df = spark.loadDS[Character]()
df.createOrReplaceTempView("people")
spark.sql("INSERT INTO table people SELECT 'Mort', 1000")

The error I see is:
unresolved operator 'InsertIntoTable DataSourceV2Relation [name#0, age#1], MongoDataSourceReader ...
'InsertIntoTable DataSourceV2Relation [name#0, age#1], MongoDataSourceReader ....
+- Project [Mort AS Mort#7, 1000 AS 1000#8]
   +- OneRowRelation

My DefaultSource V2 implementation extends DataSourceV2 with ReadSupport with ReadSupportWithSchema with WriteSupport

I'm wondering if there is something I'm not implementing, or if there is a bug in my implementation or its an issue with Spark?

Any pointers would be great,

Ross


--
Ryan Blue
Software Engineer
Netflix