Support SqlStreaming in spark

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Support SqlStreaming in spark

JackyLee
Hello

Nowadays, more and more streaming products begin to support SQL streaming,
such as KafaSQL, Flink SQL and Storm SQL. To support SQL Streaming can not
only reduce the threshold of streaming, but also make streaming easier to be
accepted by everyone.

At present, StructStreaming is relatively mature, and the StructStreaming is
based on DataSet API, which make it possibal to  provide a SQL portal for
structstreaming and run structstreaming in SQL.

To support for SQL Streaming, there are two key points:
1, Analysis should be able to parse streaming type SQL.
2, Analyzer should be able to map metadata information to the corresponding
Relation.

Running StructStreaming in SQL can bring some benefits.
1, Reduce the entry threshold of StructStreaming and attract users more
easily.
2, Encapsulate the meta information of source or sink into table, maintain
and manage uniformly, and make users more accessible.
3. Metadata permissions management, which is based on hive, can control
StructStreaming's overall authority management scheme more closely.

We have found some ways to solve this problem. It's a pleasure to discuss it
with you.

Thanks,  

Jackey Lee



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re:Re: Support SqlStreaming in spark

JackyLee
The repo you give may solve some of SqlStreaming problems, but not friendly enough, user need to learn this new syntax.

--
Jacky Lee

At 2018-06-15 11:48:01, "Bowden, Chris" <[hidden email]> wrote:

Not sure if there is a question in here, but if you are hinting that structured streaming should support a sql interface, spark has appropriate extensibility hooks to make it possible. However, the most powerful construct in structured streaming is quite difficult to find a sql equivalent for (e.g., flatMapGroupsWithState). This repo could use some cleanup but is an example of providing a sql interface to a subset of structured streaming's functionality: https://github.com/vertica/pstl/blob/master/pstl/src/main/antlr4/org/apache/spark/sql/catalyst/parser/pstl/PstlSqlBase.g4.



From: JackyLee <[hidden email]>
Sent: Thursday, June 14, 2018 7:06:17 PM
To: [hidden email]
Subject: Support SqlStreaming in spark
 
Hello

Nowadays, more and more streaming products begin to support SQL streaming,
such as KafaSQL, Flink SQL and Storm SQL. To support SQL Streaming can not
only reduce the threshold of streaming, but also make streaming easier to be
accepted by everyone.

At present, StructStreaming is relatively mature, and the StructStreaming is
based on DataSet API, which make it possibal to  provide a SQL portal for
structstreaming and run structstreaming in SQL.

To support for SQL Streaming, there are two key points:
1, Analysis should be able to parse streaming type SQL.
2, Analyzer should be able to map metadata information to the corresponding
Relation.

Running StructStreaming in SQL can bring some benefits.
1, Reduce the entry threshold of StructStreaming and attract users more
easily.
2, Encapsulate the meta information of source or sink into table, maintain
and manage uniformly, and make users more accessible.
3. Metadata permissions management, which is based on hive, can control
StructStreaming's overall authority management scheme more closely.

We have found some ways to solve this problem. It's a pleasure to discuss it
with you.

Thanks, 

Jackey Lee



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Support SqlStreaming in spark

Shixiong(Ryan) Zhu
In reply to this post by JackyLee
Structured Streaming supports standard SQL as the batch queries, so the users can switch their queries between batch and streaming easily. Could you clarify what problems SqlStreaming solves and what are the benefits of the new syntax?

Best Regards,

Ryan

On Thu, Jun 14, 2018 at 7:06 PM, JackyLee <[hidden email]> wrote:
Hello

Nowadays, more and more streaming products begin to support SQL streaming,
such as KafaSQL, Flink SQL and Storm SQL. To support SQL Streaming can not
only reduce the threshold of streaming, but also make streaming easier to be
accepted by everyone.

At present, StructStreaming is relatively mature, and the StructStreaming is
based on DataSet API, which make it possibal to  provide a SQL portal for
structstreaming and run structstreaming in SQL.

To support for SQL Streaming, there are two key points:
1, Analysis should be able to parse streaming type SQL.
2, Analyzer should be able to map metadata information to the corresponding
Relation.

Running StructStreaming in SQL can bring some benefits.
1, Reduce the entry threshold of StructStreaming and attract users more
easily.
2, Encapsulate the meta information of source or sink into table, maintain
and manage uniformly, and make users more accessible.
3. Metadata permissions management, which is based on hive, can control
StructStreaming's overall authority management scheme more closely.

We have found some ways to solve this problem. It's a pleasure to discuss it
with you.

Thanks, 

Jackey Lee



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Support SqlStreaming in spark

JackyLee
Spark JIRA:
https://issues.apache.org/jira/projects/SPARK/issues/SPARK-24630

Benefits:

Firstly, users, who are unfamiliar with streaming, can easily use SQL to run
StructStreaming especially when migrating offline tasks to real time
processing tasks.
Secondly, support SQL API in StructStreaming can also combine
StructStreaming with hive. Users can store the source/sink metadata in a
table and use hive metastore to manage it. The users, who want to read this
data, can easily create a stream by accessing the table, which can greatly
reduce the development cost and maintenance costs of StructStreaming.
Finally, easy to achieve unified management and authority control of source
and sink, and more controllable in the management of some private data,
especially in some financial or security area.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Support SqlStreaming in spark

JackyLee
The code of SQLStreaming has been pushed:

https://github.com/apache/spark/pull/22575



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]