[DISCUSS][SPARK-30275] Discussion about whether to add a gitlab-ci.yml file

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS][SPARK-30275] Discussion about whether to add a gitlab-ci.yml file

jkleckner
This story [1] proposes adding a .gitlab-ci.yml file to make it easy to create artifacts and images for spark.

Using this mechanism, people can submit any subsequent version of spark for building and image hosting with gitlab.com.

There is a companion WIP branch [2] with a candidate and example for doing this.
The exact steps for building are in the yml file [3].
The images get published into the namespace of the user as here [4]

One value of this is the ability to create versions of dependent packages such as spark-on-k8s-operator that might use upgraded packages or modifications for testing.  For example, this was used to build a version of spark that included SPARK-28938 which has yet to be released and was necessary for spark-operator to work properly with GKE service accounts [5].

Comments about desirability?

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS][SPARK-30275] Discussion about whether to add a gitlab-ci.yml file

Dongjoon Hyun-2
Hi, Jim.

Thank you for the proposal. I understand the request.
However, the following key benefit sounds like unofficial snapshot binary releases.

> For example, this was used to build a version of spark that included SPARK-28938 which has yet to be released and was necessary for spark-operator to work properly with GKE service accounts

Historically, we removed the existing snapshot binaries in some personal repositories and there is no plan to add it back.
Also, for snapshot dev jars, we use only the official Apache Maven snapshot repository.

For official releases, we aim to release Apache Spark source code (and its artifacts) according to the pre-defined release cadence in an official manner.

BTW, SPARK-28938 doesn't mean that we need to publish a docker image. Even in the official release, as you know, we only provide a reference Dockerfile. That's the reason why we don't publish docker image via GitHub Action (as of Today).

To achieve the following custom requirement, I'd like to recommend you to have your own Dockerfile.
That is the best way for you to have the flexibility.

> One value of this is the ability to create versions of dependent packages such as spark-on-k8s-operator

Thanks,
Dongjoon.


On Thu, Jan 23, 2020 at 9:32 AM Jim Kleckner <[hidden email]> wrote:
This story [1] proposes adding a .gitlab-ci.yml file to make it easy to create artifacts and images for spark.

Using this mechanism, people can submit any subsequent version of spark for building and image hosting with gitlab.com.

There is a companion WIP branch [2] with a candidate and example for doing this.
The exact steps for building are in the yml file [3].
The images get published into the namespace of the user as here [4]

One value of this is the ability to create versions of dependent packages such as spark-on-k8s-operator that might use upgraded packages or modifications for testing.  For example, this was used to build a version of spark that included SPARK-28938 which has yet to be released and was necessary for spark-operator to work properly with GKE service accounts [5].

Comments about desirability?

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS][SPARK-30275] Discussion about whether to add a gitlab-ci.yml file

Sean Owen-2
Yeah the color on this is that 'snapshot' or 'nightly' builds are not
quite _discouraged_ by the ASF, but need to be something only devs are
likely to find and clearly signposted, because they aren't official
blessed releases. It gets into a gray area if the project is
'officially' hosting a way to get snapshot builds. It is not at all
impossible, just something that's come up and generated some angst in
the past, so we dropped it.

On Thu, Jan 23, 2020 at 1:09 PM Dongjoon Hyun <[hidden email]> wrote:

>
> Hi, Jim.
>
> Thank you for the proposal. I understand the request.
> However, the following key benefit sounds like unofficial snapshot binary releases.
>
> > For example, this was used to build a version of spark that included SPARK-28938 which has yet to be released and was necessary for spark-operator to work properly with GKE service accounts
>
> Historically, we removed the existing snapshot binaries in some personal repositories and there is no plan to add it back.
> Also, for snapshot dev jars, we use only the official Apache Maven snapshot repository.
>
> For official releases, we aim to release Apache Spark source code (and its artifacts) according to the pre-defined release cadence in an official manner.
>
> BTW, SPARK-28938 doesn't mean that we need to publish a docker image. Even in the official release, as you know, we only provide a reference Dockerfile. That's the reason why we don't publish docker image via GitHub Action (as of Today).
>
> To achieve the following custom requirement, I'd like to recommend you to have your own Dockerfile.
> That is the best way for you to have the flexibility.
>
> > One value of this is the ability to create versions of dependent packages such as spark-on-k8s-operator
>
> Thanks,
> Dongjoon.
>
>
> On Thu, Jan 23, 2020 at 9:32 AM Jim Kleckner <[hidden email]> wrote:
>>
>> This story [1] proposes adding a .gitlab-ci.yml file to make it easy to create artifacts and images for spark.
>>
>> Using this mechanism, people can submit any subsequent version of spark for building and image hosting with gitlab.com.
>>
>> There is a companion WIP branch [2] with a candidate and example for doing this.
>> The exact steps for building are in the yml file [3].
>> The images get published into the namespace of the user as here [4]
>>
>> One value of this is the ability to create versions of dependent packages such as spark-on-k8s-operator that might use upgraded packages or modifications for testing.  For example, this was used to build a version of spark that included SPARK-28938 which has yet to be released and was necessary for spark-operator to work properly with GKE service accounts [5].
>>
>> Comments about desirability?
>>
>> [1] https://issues.apache.org/jira/browse/SPARK-30275
>> [2] https://gitlab.com/jkleckner/spark/tree/add-gitlab-ci-yml
>> [3] https://gitlab.com/jkleckner/spark/blob/add-gitlab-ci-yml/.gitlab-ci.yml
>> [4] https://gitlab.com/jkleckner/spark/container_registry
>> [5] https://gitlab.com/jkleckner/spark-on-k8s-operator/container_registry

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS][SPARK-30275] Discussion about whether to add a gitlab-ci.yml file

jkleckner
I understand that "non-dev" persons could become confused and that some sort of signposting/warning makes sense.

Certainly I consider my personal registry on gitlab.com as ephemeral and not intended to publish.
We have our own private instance of gitlab where I put artifacts that are derived and this was needed to work with GKE as mentioned since 2.4.4 does not out of the box work with service accounts the way we use them..

I can keep this file as a branch of my own that I manually merge when needed if others don't find this useful or the risk of confusion is greater than the value.

Simply close as not desirable the JIRA at: https://issues.apache.org/jira/browse/SPARK-30275

And now there are discussions both in email and JIRA...

Jim




On Thu, Jan 23, 2020 at 11:15 AM Sean Owen <[hidden email]> wrote:
Yeah the color on this is that 'snapshot' or 'nightly' builds are not
quite _discouraged_ by the ASF, but need to be something only devs are
likely to find and clearly signposted, because they aren't official
blessed releases. It gets into a gray area if the project is
'officially' hosting a way to get snapshot builds. It is not at all
impossible, just something that's come up and generated some angst in
the past, so we dropped it.

On Thu, Jan 23, 2020 at 1:09 PM Dongjoon Hyun <[hidden email]> wrote:
>
> Hi, Jim.
>
> Thank you for the proposal. I understand the request.
> However, the following key benefit sounds like unofficial snapshot binary releases.
>
> > For example, this was used to build a version of spark that included SPARK-28938 which has yet to be released and was necessary for spark-operator to work properly with GKE service accounts
>
> Historically, we removed the existing snapshot binaries in some personal repositories and there is no plan to add it back.
> Also, for snapshot dev jars, we use only the official Apache Maven snapshot repository.
>
> For official releases, we aim to release Apache Spark source code (and its artifacts) according to the pre-defined release cadence in an official manner.
>
> BTW, SPARK-28938 doesn't mean that we need to publish a docker image. Even in the official release, as you know, we only provide a reference Dockerfile. That's the reason why we don't publish docker image via GitHub Action (as of Today).
>
> To achieve the following custom requirement, I'd like to recommend you to have your own Dockerfile.
> That is the best way for you to have the flexibility.
>
> > One value of this is the ability to create versions of dependent packages such as spark-on-k8s-operator
>
> Thanks,
> Dongjoon.
>
>
> On Thu, Jan 23, 2020 at 9:32 AM Jim Kleckner <[hidden email]> wrote:
>>
>> This story [1] proposes adding a .gitlab-ci.yml file to make it easy to create artifacts and images for spark.
>>
>> Using this mechanism, people can submit any subsequent version of spark for building and image hosting with gitlab.com.
>>
>> There is a companion WIP branch [2] with a candidate and example for doing this.
>> The exact steps for building are in the yml file [3].
>> The images get published into the namespace of the user as here [4]
>>
>> One value of this is the ability to create versions of dependent packages such as spark-on-k8s-operator that might use upgraded packages or modifications for testing.  For example, this was used to build a version of spark that included SPARK-28938 which has yet to be released and was necessary for spark-operator to work properly with GKE service accounts [5].
>>
>> Comments about desirability?
>>
>> [1] https://issues.apache.org/jira/browse/SPARK-30275
>> [2] https://gitlab.com/jkleckner/spark/tree/add-gitlab-ci-yml
>> [3] https://gitlab.com/jkleckner/spark/blob/add-gitlab-ci-yml/.gitlab-ci.yml
>> [4] https://gitlab.com/jkleckner/spark/container_registry
>> [5] https://gitlab.com/jkleckner/spark-on-k8s-operator/container_registry
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS][SPARK-30275] Discussion about whether to add a gitlab-ci.yml file

Erik Erlandson-2
Can a '.gitlab-ci.yml' be considered code, in the same way that the k8s related dockerfiles are code?  In other words, something like: "here is a piece of code you might choose to use for building your own binaries, that is not specifically endorsed by Apache Spark"? So it would not be involved in the creation of nightly binaries by Apache Spark per se, but could be used by individuals for that purpose.

On Thu, Jan 23, 2020 at 3:52 PM Jim Kleckner <[hidden email]> wrote:
I understand that "non-dev" persons could become confused and that some sort of signposting/warning makes sense.

Certainly I consider my personal registry on gitlab.com as ephemeral and not intended to publish.
We have our own private instance of gitlab where I put artifacts that are derived and this was needed to work with GKE as mentioned since 2.4.4 does not out of the box work with service accounts the way we use them..

I can keep this file as a branch of my own that I manually merge when needed if others don't find this useful or the risk of confusion is greater than the value.

Simply close as not desirable the JIRA at: https://issues.apache.org/jira/browse/SPARK-30275

And now there are discussions both in email and JIRA...

Jim




On Thu, Jan 23, 2020 at 11:15 AM Sean Owen <[hidden email]> wrote:
Yeah the color on this is that 'snapshot' or 'nightly' builds are not
quite _discouraged_ by the ASF, but need to be something only devs are
likely to find and clearly signposted, because they aren't official
blessed releases. It gets into a gray area if the project is
'officially' hosting a way to get snapshot builds. It is not at all
impossible, just something that's come up and generated some angst in
the past, so we dropped it.

On Thu, Jan 23, 2020 at 1:09 PM Dongjoon Hyun <[hidden email]> wrote:
>
> Hi, Jim.
>
> Thank you for the proposal. I understand the request.
> However, the following key benefit sounds like unofficial snapshot binary releases.
>
> > For example, this was used to build a version of spark that included SPARK-28938 which has yet to be released and was necessary for spark-operator to work properly with GKE service accounts
>
> Historically, we removed the existing snapshot binaries in some personal repositories and there is no plan to add it back.
> Also, for snapshot dev jars, we use only the official Apache Maven snapshot repository.
>
> For official releases, we aim to release Apache Spark source code (and its artifacts) according to the pre-defined release cadence in an official manner.
>
> BTW, SPARK-28938 doesn't mean that we need to publish a docker image. Even in the official release, as you know, we only provide a reference Dockerfile. That's the reason why we don't publish docker image via GitHub Action (as of Today).
>
> To achieve the following custom requirement, I'd like to recommend you to have your own Dockerfile.
> That is the best way for you to have the flexibility.
>
> > One value of this is the ability to create versions of dependent packages such as spark-on-k8s-operator
>
> Thanks,
> Dongjoon.
>
>
> On Thu, Jan 23, 2020 at 9:32 AM Jim Kleckner <[hidden email]> wrote:
>>
>> This story [1] proposes adding a .gitlab-ci.yml file to make it easy to create artifacts and images for spark.
>>
>> Using this mechanism, people can submit any subsequent version of spark for building and image hosting with gitlab.com.
>>
>> There is a companion WIP branch [2] with a candidate and example for doing this.
>> The exact steps for building are in the yml file [3].
>> The images get published into the namespace of the user as here [4]
>>
>> One value of this is the ability to create versions of dependent packages such as spark-on-k8s-operator that might use upgraded packages or modifications for testing.  For example, this was used to build a version of spark that included SPARK-28938 which has yet to be released and was necessary for spark-operator to work properly with GKE service accounts [5].
>>
>> Comments about desirability?
>>
>> [1] https://issues.apache.org/jira/browse/SPARK-30275
>> [2] https://gitlab.com/jkleckner/spark/tree/add-gitlab-ci-yml
>> [3] https://gitlab.com/jkleckner/spark/blob/add-gitlab-ci-yml/.gitlab-ci.yml
>> [4] https://gitlab.com/jkleckner/spark/container_registry
>> [5] https://gitlab.com/jkleckner/spark-on-k8s-operator/container_registry
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS][SPARK-30275] Discussion about whether to add a gitlab-ci.yml file

jkleckner
Sure, it seems like an optional thing to me.  Spark has a Jenkins setup for building and testing.  This would only affect someone that pushes the code to gitlab.

I'm happy to keep the commit in a small private branch of my own that I apply when I need to build an out of cycle build.  I just thought others might find it useful.  If not, then not a problem.


On Sun, Jan 26, 2020 at 8:23 AM Erik Erlandson <[hidden email]> wrote:
Can a '.gitlab-ci.yml' be considered code, in the same way that the k8s related dockerfiles are code?  In other words, something like: "here is a piece of code you might choose to use for building your own binaries, that is not specifically endorsed by Apache Spark"? So it would not be involved in the creation of nightly binaries by Apache Spark per se, but could be used by individuals for that purpose.

On Thu, Jan 23, 2020 at 3:52 PM Jim Kleckner <[hidden email]> wrote:
I understand that "non-dev" persons could become confused and that some sort of signposting/warning makes sense.

Certainly I consider my personal registry on gitlab.com as ephemeral and not intended to publish.
We have our own private instance of gitlab where I put artifacts that are derived and this was needed to work with GKE as mentioned since 2.4.4 does not out of the box work with service accounts the way we use them..

I can keep this file as a branch of my own that I manually merge when needed if others don't find this useful or the risk of confusion is greater than the value.

Simply close as not desirable the JIRA at: https://issues.apache.org/jira/browse/SPARK-30275

And now there are discussions both in email and JIRA...

Jim




On Thu, Jan 23, 2020 at 11:15 AM Sean Owen <[hidden email]> wrote:
Yeah the color on this is that 'snapshot' or 'nightly' builds are not
quite _discouraged_ by the ASF, but need to be something only devs are
likely to find and clearly signposted, because they aren't official
blessed releases. It gets into a gray area if the project is
'officially' hosting a way to get snapshot builds. It is not at all
impossible, just something that's come up and generated some angst in
the past, so we dropped it.

On Thu, Jan 23, 2020 at 1:09 PM Dongjoon Hyun <[hidden email]> wrote:
>
> Hi, Jim.
>
> Thank you for the proposal. I understand the request.
> However, the following key benefit sounds like unofficial snapshot binary releases.
>
> > For example, this was used to build a version of spark that included SPARK-28938 which has yet to be released and was necessary for spark-operator to work properly with GKE service accounts
>
> Historically, we removed the existing snapshot binaries in some personal repositories and there is no plan to add it back.
> Also, for snapshot dev jars, we use only the official Apache Maven snapshot repository.
>
> For official releases, we aim to release Apache Spark source code (and its artifacts) according to the pre-defined release cadence in an official manner.
>
> BTW, SPARK-28938 doesn't mean that we need to publish a docker image. Even in the official release, as you know, we only provide a reference Dockerfile. That's the reason why we don't publish docker image via GitHub Action (as of Today).
>
> To achieve the following custom requirement, I'd like to recommend you to have your own Dockerfile.
> That is the best way for you to have the flexibility.
>
> > One value of this is the ability to create versions of dependent packages such as spark-on-k8s-operator
>
> Thanks,
> Dongjoon.
>
>
> On Thu, Jan 23, 2020 at 9:32 AM Jim Kleckner <[hidden email]> wrote:
>>
>> This story [1] proposes adding a .gitlab-ci.yml file to make it easy to create artifacts and images for spark.
>>
>> Using this mechanism, people can submit any subsequent version of spark for building and image hosting with gitlab.com.
>>
>> There is a companion WIP branch [2] with a candidate and example for doing this.
>> The exact steps for building are in the yml file [3].
>> The images get published into the namespace of the user as here [4]
>>
>> One value of this is the ability to create versions of dependent packages such as spark-on-k8s-operator that might use upgraded packages or modifications for testing.  For example, this was used to build a version of spark that included SPARK-28938 which has yet to be released and was necessary for spark-operator to work properly with GKE service accounts [5].
>>
>> Comments about desirability?
>>
>> [1] https://issues.apache.org/jira/browse/SPARK-30275
>> [2] https://gitlab.com/jkleckner/spark/tree/add-gitlab-ci-yml
>> [3] https://gitlab.com/jkleckner/spark/blob/add-gitlab-ci-yml/.gitlab-ci.yml
>> [4] https://gitlab.com/jkleckner/spark/container_registry
>> [5] https://gitlab.com/jkleckner/spark-on-k8s-operator/container_registry