Spark SQL Data Sources API

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Spark SQL Data Sources API

dyzhou
This post has NOT been accepted by the mailing list yet.
I'm primarily interested in using the Data Sources API for REMOTE data sources, e.g. the Redshift data source https://databricks.com/blog/2015/10/19/introducing-redshift-data-source-for-spark.html.  For this, it seems to me that it works very similarly to Hive storage handlers, i.e. creating a non-native table for each remote table we want to read/write -- is this correct?

Ideally, I am looking for something similar to *catalogs* in Presto.  A catalog basically points to a remote host (not just a table), and allow you to query any table on that host using the catalog_name.table_name syntax.  Do you know whether there is something similar in Spark SQL and/or have any suggestions on how it could be implemented?

Thanks in advance for any response.