[Structured Streaming] Kafka group.id is fixed

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[Structured Streaming] Kafka group.id is fixed

Anastasios Zouzias
Hi all,

I run in the following situation with Spark Structure Streaming (SS) using Kafka.

In a project that I work on, there is already a secured Kafka setup where ops can issue an SSL certificate per "group.id", which should be predefined (or hopefully its prefix to be predefined).

On the other hand, Spark SS fixes the group.id to 

val uniqueGroupId = s"spark-kafka-source-${UUID.randomUUID}-${metadataPath.hashCode}"

see, i.e.,


I guess Spark developers had a good reason to fix it, but is it possible to make configurable the prefix of the above uniqueGroupId ("spark-kafka-source")? If so, I could prepare a PR on it.

The rational is that we do not want all spark-jobs to use the same certificate on group-ids of the form (spark-kafka-source-*).


Best regards,
Anastasios Zouzias
Reply | Threaded
Open this post in threaded view
|

Re: [Structured Streaming] Kafka group.id is fixed

Cody Koeninger-2
That sounds reasonable to me
On Fri, Nov 9, 2018 at 2:26 AM Anastasios Zouzias <[hidden email]> wrote:

>
> Hi all,
>
> I run in the following situation with Spark Structure Streaming (SS) using Kafka.
>
> In a project that I work on, there is already a secured Kafka setup where ops can issue an SSL certificate per "group.id", which should be predefined (or hopefully its prefix to be predefined).
>
> On the other hand, Spark SS fixes the group.id to
>
> val uniqueGroupId = s"spark-kafka-source-${UUID.randomUUID}-${metadataPath.hashCode}"
>
> see, i.e.,
>
> https://github.com/apache/spark/blob/v2.4.0/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L124
>
> I guess Spark developers had a good reason to fix it, but is it possible to make configurable the prefix of the above uniqueGroupId ("spark-kafka-source")? If so, I could prepare a PR on it.
>
> The rational is that we do not want all spark-jobs to use the same certificate on group-ids of the form (spark-kafka-source-*).
>
>
> Best regards,
> Anastasios Zouzias

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Structured Streaming] Kafka group.id is fixed

Tom Graves-2
In reply to this post by Anastasios Zouzias
This makes sense to me and was going to propose something similar in order to be able to use the kafka acls more effectively as well, can you file a jira for it?

Tom

On Friday, November 9, 2018, 2:26:12 AM CST, Anastasios Zouzias <[hidden email]> wrote:


Hi all,

I run in the following situation with Spark Structure Streaming (SS) using Kafka.

In a project that I work on, there is already a secured Kafka setup where ops can issue an SSL certificate per "group.id", which should be predefined (or hopefully its prefix to be predefined).

On the other hand, Spark SS fixes the group.id to 

val uniqueGroupId = s"spark-kafka-source-${UUID.randomUUID}-${metadataPath.hashCode}"

see, i.e.,


I guess Spark developers had a good reason to fix it, but is it possible to make configurable the prefix of the above uniqueGroupId ("spark-kafka-source")? If so, I could prepare a PR on it.

The rational is that we do not want all spark-jobs to use the same certificate on group-ids of the form (spark-kafka-source-*).


Best regards,
Anastasios Zouzias
Reply | Threaded
Open this post in threaded view
|

Re: [Structured Streaming] Kafka group.id is fixed

Anastasios Zouzias
Hi Tom,


Feel free to edit/update the ticket. If someone familiar with the codebase has any suggestion on the proper way of fixing this, I could work on it.

Best,
Anastasios

On Mon, Nov 19, 2018 at 4:31 PM Tom Graves <[hidden email]> wrote:
This makes sense to me and was going to propose something similar in order to be able to use the kafka acls more effectively as well, can you file a jira for it?

Tom

On Friday, November 9, 2018, 2:26:12 AM CST, Anastasios Zouzias <[hidden email]> wrote:


Hi all,

I run in the following situation with Spark Structure Streaming (SS) using Kafka.

In a project that I work on, there is already a secured Kafka setup where ops can issue an SSL certificate per "group.id", which should be predefined (or hopefully its prefix to be predefined).

On the other hand, Spark SS fixes the group.id to 

val uniqueGroupId = s"spark-kafka-source-${UUID.randomUUID}-${metadataPath.hashCode}"

see, i.e.,


I guess Spark developers had a good reason to fix it, but is it possible to make configurable the prefix of the above uniqueGroupId ("spark-kafka-source")? If so, I could prepare a PR on it.

The rational is that we do not want all spark-jobs to use the same certificate on group-ids of the form (spark-kafka-source-*).


Best regards,
Anastasios Zouzias


--
-- Anastasios Zouzias
Reply | Threaded
Open this post in threaded view
|

Re: [Structured Streaming] Kafka group.id is fixed

Cody Koeninger-2
Anastasios it looks like you already identified the two lines that
need to change, the string interpolation that depends on
UUID.randomUUID and metadataPath.hashCode.

I'd factor that out into a function that returns the group id.  That
function would also need to take the "parameters" variable (the map of
user-provided options) and look for a prefix for the group id,
defaulting to the current behavior.

If you have questions, feel free to ping me on the jira, or get as far
as you can and submit a PR for more discussion.
On Mon, Nov 19, 2018 at 2:38 PM Anastasios Zouzias <[hidden email]> wrote:

>
> Hi Tom,
>
> I initiated an issue here: https://issues.apache.org/jira/browse/SPARK-26121
>
> Feel free to edit/update the ticket. If someone familiar with the codebase has any suggestion on the proper way of fixing this, I could work on it.
>
> Best,
> Anastasios
>
> On Mon, Nov 19, 2018 at 4:31 PM Tom Graves <[hidden email]> wrote:
>>
>> This makes sense to me and was going to propose something similar in order to be able to use the kafka acls more effectively as well, can you file a jira for it?
>>
>> Tom
>>
>> On Friday, November 9, 2018, 2:26:12 AM CST, Anastasios Zouzias <[hidden email]> wrote:
>>
>>
>> Hi all,
>>
>> I run in the following situation with Spark Structure Streaming (SS) using Kafka.
>>
>> In a project that I work on, there is already a secured Kafka setup where ops can issue an SSL certificate per "group.id", which should be predefined (or hopefully its prefix to be predefined).
>>
>> On the other hand, Spark SS fixes the group.id to
>>
>> val uniqueGroupId = s"spark-kafka-source-${UUID.randomUUID}-${metadataPath.hashCode}"
>>
>> see, i.e.,
>>
>> https://github.com/apache/spark/blob/v2.4.0/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L124
>>
>> I guess Spark developers had a good reason to fix it, but is it possible to make configurable the prefix of the above uniqueGroupId ("spark-kafka-source")? If so, I could prepare a PR on it.
>>
>> The rational is that we do not want all spark-jobs to use the same certificate on group-ids of the form (spark-kafka-source-*).
>>
>>
>> Best regards,
>> Anastasios Zouzias
>
>
>
> --
> -- Anastasios Zouzias

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]