[KafkaSourceProvider] Why topic option and column without reverting to path as the least priority?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[KafkaSourceProvider] Why topic option and column without reverting to path as the least priority?

Jacek Laskowski
Hi,

I've just found out that KafkaSourceProvider supports topic option
that sets the Kafka topic to save a DataFrame to.

You can also use topic column to assign rows to topics.

Given the features, I've been wondering why "path" option is not
supported (even of least precedence) so when no topic column or option
are defined, save(path: String) would be the least priority.

WDYT?

It looks pretty trivial to support --> see KafkaSourceProvider at
lines [1] and [2] if I'm not mistaken.

[1] https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L145
[2] https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L163

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [KafkaSourceProvider] Why topic option and column without reverting to path as the least priority?

Cody Koeninger-2
I'm confused about what you're suggesting.  Are you saying that a
Kafka sink should take a filesystem path as an option?

On Mon, May 1, 2017 at 8:52 AM, Jacek Laskowski <[hidden email]> wrote:

> Hi,
>
> I've just found out that KafkaSourceProvider supports topic option
> that sets the Kafka topic to save a DataFrame to.
>
> You can also use topic column to assign rows to topics.
>
> Given the features, I've been wondering why "path" option is not
> supported (even of least precedence) so when no topic column or option
> are defined, save(path: String) would be the least priority.
>
> WDYT?
>
> It looks pretty trivial to support --> see KafkaSourceProvider at
> lines [1] and [2] if I'm not mistaken.
>
> [1] https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L145
> [2] https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L163
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [KafkaSourceProvider] Why topic option and column without reverting to path as the least priority?

Michael Armbrust
He's just suggesting that since the DataStreamWriter start() method can fill in an option named "path", we should make that a synonym for "topic".  Then you could do something like.

df.writeStream.format("kafka").start("topic")

Seems reasonable if people don't think that is confusing.

On Mon, May 1, 2017 at 8:43 AM, Cody Koeninger <[hidden email]> wrote:
I'm confused about what you're suggesting.  Are you saying that a
Kafka sink should take a filesystem path as an option?

On Mon, May 1, 2017 at 8:52 AM, Jacek Laskowski <[hidden email]> wrote:
> Hi,
>
> I've just found out that KafkaSourceProvider supports topic option
> that sets the Kafka topic to save a DataFrame to.
>
> You can also use topic column to assign rows to topics.
>
> Given the features, I've been wondering why "path" option is not
> supported (even of least precedence) so when no topic column or option
> are defined, save(path: String) would be the least priority.
>
> WDYT?
>
> It looks pretty trivial to support --> see KafkaSourceProvider at
> lines [1] and [2] if I'm not mistaken.
>
> [1] https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L145
> [2] https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L163
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: [KafkaSourceProvider] Why topic option and column without reverting to path as the least priority?

Jacek Laskowski
Hi, 

Thanks Cody and Michael! I didn't expect to get two answers so quickly and from THE brains behind spark - Kafka integration. #impressed

Yes, Michael has nailed it. Using save's path was so natural to me after months with Spark that I was surprised to not have seen it instead of the custom and surely not very obvious topic.

Imagine my day today when I'd discovered that I could use KafkaSource in batch queries and then suddenly found out about no support for path in save. I'm not faint-hearted so I survived :-)

I think that change would make KafkaSource even cooler. Please add support if possible (and make it part of the upcoming 2.2.0, too!)

Thanks. 

Jacek

On 1 May 2017 7:26 p.m., "Michael Armbrust" <[hidden email]> wrote:
He's just suggesting that since the DataStreamWriter start() method can fill in an option named "path", we should make that a synonym for "topic".  Then you could do something like.

df.writeStream.format("kafka").start("topic")

Seems reasonable if people don't think that is confusing.

On Mon, May 1, 2017 at 8:43 AM, Cody Koeninger <[hidden email]> wrote:
I'm confused about what you're suggesting.  Are you saying that a
Kafka sink should take a filesystem path as an option?

On Mon, May 1, 2017 at 8:52 AM, Jacek Laskowski <[hidden email]> wrote:
> Hi,
>
> I've just found out that KafkaSourceProvider supports topic option
> that sets the Kafka topic to save a DataFrame to.
>
> You can also use topic column to assign rows to topics.
>
> Given the features, I've been wondering why "path" option is not
> supported (even of least precedence) so when no topic column or option
> are defined, save(path: String) would be the least priority.
>
> WDYT?
>
> It looks pretty trivial to support --> see KafkaSourceProvider at
> lines [1] and [2] if I'm not mistaken.
>
> [1] https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L145
> [2] https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L163
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: [KafkaSourceProvider] Why topic option and column without reverting to path as the least priority?

Cody Koeninger-2
Yeah, seems reasonable.

On Mon, May 1, 2017 at 12:40 PM, Jacek Laskowski <[hidden email]> wrote:

> Hi,
>
> Thanks Cody and Michael! I didn't expect to get two answers so quickly and
> from THE brains behind spark - Kafka integration. #impressed
>
> Yes, Michael has nailed it. Using save's path was so natural to me after
> months with Spark that I was surprised to not have seen it instead of the
> custom and surely not very obvious topic.
>
> Imagine my day today when I'd discovered that I could use KafkaSource in
> batch queries and then suddenly found out about no support for path in save.
> I'm not faint-hearted so I survived :-)
>
> I think that change would make KafkaSource even cooler. Please add support
> if possible (and make it part of the upcoming 2.2.0, too!)
>
> Thanks.
>
> Jacek
>
> On 1 May 2017 7:26 p.m., "Michael Armbrust" <[hidden email]> wrote:
>>
>> He's just suggesting that since the DataStreamWriter start() method can
>> fill in an option named "path", we should make that a synonym for "topic".
>> Then you could do something like.
>>
>> df.writeStream.format("kafka").start("topic")
>>
>> Seems reasonable if people don't think that is confusing.
>>
>> On Mon, May 1, 2017 at 8:43 AM, Cody Koeninger <[hidden email]> wrote:
>>>
>>> I'm confused about what you're suggesting.  Are you saying that a
>>> Kafka sink should take a filesystem path as an option?
>>>
>>> On Mon, May 1, 2017 at 8:52 AM, Jacek Laskowski <[hidden email]> wrote:
>>> > Hi,
>>> >
>>> > I've just found out that KafkaSourceProvider supports topic option
>>> > that sets the Kafka topic to save a DataFrame to.
>>> >
>>> > You can also use topic column to assign rows to topics.
>>> >
>>> > Given the features, I've been wondering why "path" option is not
>>> > supported (even of least precedence) so when no topic column or option
>>> > are defined, save(path: String) would be the least priority.
>>> >
>>> > WDYT?
>>> >
>>> > It looks pretty trivial to support --> see KafkaSourceProvider at
>>> > lines [1] and [2] if I'm not mistaken.
>>> >
>>> > [1]
>>> > https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L145
>>> > [2]
>>> > https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L163
>>> >
>>> > Pozdrawiam,
>>> > Jacek Laskowski
>>> > ----
>>> > https://medium.com/@jaceklaskowski/
>>> > Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
>>> > Follow me at https://twitter.com/jaceklaskowski
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe e-mail: [hidden email]
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: [hidden email]
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [KafkaSourceProvider] Why topic option and column without reverting to path as the least priority?

Jacek Laskowski
https://issues.apache.org/jira/browse/SPARK-20597

I'm going to send a PR soon.

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Mon, May 1, 2017 at 8:26 PM, Cody Koeninger <[hidden email]> wrote:

> Yeah, seems reasonable.
>
> On Mon, May 1, 2017 at 12:40 PM, Jacek Laskowski <[hidden email]> wrote:
>> Hi,
>>
>> Thanks Cody and Michael! I didn't expect to get two answers so quickly and
>> from THE brains behind spark - Kafka integration. #impressed
>>
>> Yes, Michael has nailed it. Using save's path was so natural to me after
>> months with Spark that I was surprised to not have seen it instead of the
>> custom and surely not very obvious topic.
>>
>> Imagine my day today when I'd discovered that I could use KafkaSource in
>> batch queries and then suddenly found out about no support for path in save.
>> I'm not faint-hearted so I survived :-)
>>
>> I think that change would make KafkaSource even cooler. Please add support
>> if possible (and make it part of the upcoming 2.2.0, too!)
>>
>> Thanks.
>>
>> Jacek
>>
>> On 1 May 2017 7:26 p.m., "Michael Armbrust" <[hidden email]> wrote:
>>>
>>> He's just suggesting that since the DataStreamWriter start() method can
>>> fill in an option named "path", we should make that a synonym for "topic".
>>> Then you could do something like.
>>>
>>> df.writeStream.format("kafka").start("topic")
>>>
>>> Seems reasonable if people don't think that is confusing.
>>>
>>> On Mon, May 1, 2017 at 8:43 AM, Cody Koeninger <[hidden email]> wrote:
>>>>
>>>> I'm confused about what you're suggesting.  Are you saying that a
>>>> Kafka sink should take a filesystem path as an option?
>>>>
>>>> On Mon, May 1, 2017 at 8:52 AM, Jacek Laskowski <[hidden email]> wrote:
>>>> > Hi,
>>>> >
>>>> > I've just found out that KafkaSourceProvider supports topic option
>>>> > that sets the Kafka topic to save a DataFrame to.
>>>> >
>>>> > You can also use topic column to assign rows to topics.
>>>> >
>>>> > Given the features, I've been wondering why "path" option is not
>>>> > supported (even of least precedence) so when no topic column or option
>>>> > are defined, save(path: String) would be the least priority.
>>>> >
>>>> > WDYT?
>>>> >
>>>> > It looks pretty trivial to support --> see KafkaSourceProvider at
>>>> > lines [1] and [2] if I'm not mistaken.
>>>> >
>>>> > [1]
>>>> > https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L145
>>>> > [2]
>>>> > https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L163
>>>> >
>>>> > Pozdrawiam,
>>>> > Jacek Laskowski
>>>> > ----
>>>> > https://medium.com/@jaceklaskowski/
>>>> > Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
>>>> > Follow me at https://twitter.com/jaceklaskowski
>>>> >
>>>> > ---------------------------------------------------------------------
>>>> > To unsubscribe e-mail: [hidden email]
>>>> >
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: [hidden email]
>>>>
>>>
>>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]