Increase the number of parallel jobs in GitHub Actions at ASF organization level

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Increase the number of parallel jobs in GitHub Actions at ASF organization level

Hyukjin Kwon
Hi all,

I am an Apache Spark PMC, and would like to know the future plan about GitHub Actions in ASF.
Please also see the INFRA ticket I filed: https://issues.apache.org/jira/browse/INFRA-21646.

I am aware of the limited GitHub Actions resources that are shared across all projects in ASF,
and many projects suffer from it. This issue significantly slows down the development cycle of
 other projects, at least Apache Spark.

How do we plan to increase the resources in GitHub Actions, and what are the blockers? I would appreciate any input and thoughts on this.

Thank you so much.

CC'ing Spark [hidden email] for more visibility. Please take it out if considered inappropriate.
Reply | Threaded
Open this post in threaded view
|

Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

Jarek Potiuk
Just a comment here - as  I commented also in the ticket

The document https://cwiki.apache.org/confluence/display/BUILDS/GitHub+Actions+status  gives complete overview of where the Github Actions are  for the ASF project.

And we have some nice experiences in Apache Airflow that we will be able to share soon likely with running our own self -hosted runners. More in this comment https://issues.apache.org/jira/browse/INFRA-21646?focusedCommentId=17316108&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17316108 


J.




On Wed, Apr 7, 2021 at 7:24 AM Hyukjin Kwon <[hidden email]> wrote:
Hi all,

I am an Apache Spark PMC, and would like to know the future plan about
GitHub Actions in ASF.
Please also see the INFRA ticket I filed:
https://issues.apache.org/jira/browse/INFRA-21646.

I am aware of the limited GitHub Actions resources that are shared
across all projects in ASF,
and many projects suffer from it. This issue significantly slows down the
development cycle of
 other projects, at least Apache Spark.

How do we plan to increase the resources in GitHub Actions, and what are
the blockers? I would appreciate any input and thoughts on this.

Thank you so much.

CC'ing Spark @dev <[hidden email]> for more visibility. Please take
it out if considered inappropriate.


--
+48 660 796 129
Reply | Threaded
Open this post in threaded view
|

Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

Greg Stein
In reply to this post by Hyukjin Kwon
On Wed, Apr 7, 2021 at 12:25 AM Hyukjin Kwon <[hidden email]> wrote:
Hi all,

I am an Apache Spark PMC,

You are a member of the Apache Spark PMC. You are *not* a PMC. Please stop with that terminology. The Foundation has about 200 PMCs, and you are a member of one of them. You are NOT a "PMC" .. you're a person. A PMC is a construct of the Foundation.

>... 
I am aware of the limited GitHub Actions resources that are shared
across all projects in ASF,
and many projects suffer from it. This issue significantly slows down the
development cycle of
 other projects, at least Apache Spark.

And the Foundation gets those build minutes for GitHub Actions provided to us from GitHub and Microsoft, and we are thankful that they provide them to the Foundation. Maybe it isn't all the build minutes that every group wants, but that is what we have. So it is incumbent upon all of us to figure out how to build more, with fewer minutes.

Say "thank you" to GitHub, please.

Regards,
-g

Reply | Threaded
Open this post in threaded view
|

Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

Hyukjin Kwon
Hi Greg,

I raised this thread to figure out a way that we can work together to resolve this issue, gather feedback, and to understand how other projects work around.
Several projects I observed, as far as I can tell, have made enough efforts to save the resources in GitHub Actions but still suffer from the lack of resources.
I appreciate the resources provided to us but that does not resolve the issue of the development being slowed down.


2021년 4월 7일 (수) 오후 5:52, Greg Stein <[hidden email]>님이 작성:
On Wed, Apr 7, 2021 at 12:25 AM Hyukjin Kwon <[hidden email]> wrote:
Hi all,

I am an Apache Spark PMC,

You are a member of the Apache Spark PMC. You are *not* a PMC. Please stop with that terminology. The Foundation has about 200 PMCs, and you are a member of one of them. You are NOT a "PMC" .. you're a person. A PMC is a construct of the Foundation.

>... 
I am aware of the limited GitHub Actions resources that are shared
across all projects in ASF,
and many projects suffer from it. This issue significantly slows down the
development cycle of
 other projects, at least Apache Spark.

And the Foundation gets those build minutes for GitHub Actions provided to us from GitHub and Microsoft, and we are thankful that they provide them to the Foundation. Maybe it isn't all the build minutes that every group wants, but that is what we have. So it is incumbent upon all of us to figure out how to build more, with fewer minutes.

Say "thank you" to GitHub, please.

Regards,
-g

Reply | Threaded
Open this post in threaded view
|

Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

Martin Grigorov


On Wed, Apr 7, 2021 at 3:41 PM Hyukjin Kwon <[hidden email]> wrote:
Hi Greg,

I raised this thread to figure out a way that we can work together to
resolve this issue, gather feedback, and to understand how other projects
work around.
Several projects I observed, as far as I can tell, have made enough efforts
to save the resources in GitHub Actions but still suffer from the lack of
resources.

And it will get even worse because:
1) more and more Apache projects migrate from TravisCI to Github Actions (GA)
2) new projects join ASF and many of them already use GA


What was your reason to migrate from Apache Jenkins to Github Actions ? 
If you want dedicated resources then you will need to manage the CI yourself. 
You could use Apache Jenkins/Buildbot with dedicated agents for your project.
Or you could set up your own CI infrastructure with Jenkins, DroneIO, ConcourceCI, ...

Yet another option is to move to CircleCI or Cirrus. They are similar to TravisCI / GA and less crowded (for now).

Martin

I appreciate the resources provided to us but that does not resolve the
issue of the development being slowed down.


2021년 4월 7일 (수) 오후 5:52, Greg Stein <[hidden email]>님이 작성:

> On Wed, Apr 7, 2021 at 12:25 AM Hyukjin Kwon <[hidden email]> wrote:
>
>> Hi all,
>>
>> I am an Apache Spark PMC,
>
>
> You are a member of the Apache Spark PMC. You are *not* a PMC. Please stop
> with that terminology. The Foundation has about 200 PMCs, and you are a
> member of one of them. You are NOT a "PMC" .. you're a person. A PMC is a
> construct of the Foundation.
>
> >...
>
>> I am aware of the limited GitHub Actions resources that are shared
>> across all projects in ASF,
>> and many projects suffer from it. This issue significantly slows down the
>> development cycle of
>>  other projects, at least Apache Spark.
>>
>
> And the Foundation gets those build minutes for GitHub Actions provided to
> us from GitHub and Microsoft, and we are thankful that they provide them to
> the Foundation. Maybe it isn't all the build minutes that every group
> wants, but that is what we have. So it is incumbent upon all of us to
> figure out how to build more, with fewer minutes.
>
> Say "thank you" to GitHub, please.
>
> Regards,
> -g
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

Hyukjin Kwon
Thanks Martin for your feedback.

> What was your reason to migrate from Apache Jenkins to Github Actions ? 

I am sure there were more reasons for migrating from Amplap Jenkins to GitHub Actions but as far as I can remember:
- To reduce the maintenance cost of machines
- The Jenkins machines became unstable and slow causing CI jobs to fail or be very flaky.
- Difficulty to manage the installed libraries.
- Intermittent unknown issues in the machines

Yes, one option might be to consider other options to migrate again. However, other projects will very likely suffer the
same problem. In addition, the migration in a large project is not an easy work to do

I would like to know the feasibility of having more resources in GitHub Actions, or, for example, having sub-groups where
each group shares the resources - currently one GitHub organisation shares all resources across the projects.


2021년 4월 7일 (수) 오후 10:04, Martin Grigorov <[hidden email]>님이 작성:


On Wed, Apr 7, 2021 at 3:41 PM Hyukjin Kwon <[hidden email]> wrote:
Hi Greg,

I raised this thread to figure out a way that we can work together to
resolve this issue, gather feedback, and to understand how other projects
work around.
Several projects I observed, as far as I can tell, have made enough efforts
to save the resources in GitHub Actions but still suffer from the lack of
resources.

And it will get even worse because:
1) more and more Apache projects migrate from TravisCI to Github Actions (GA)
2) new projects join ASF and many of them already use GA


What was your reason to migrate from Apache Jenkins to Github Actions ? 
If you want dedicated resources then you will need to manage the CI yourself. 
You could use Apache Jenkins/Buildbot with dedicated agents for your project.
Or you could set up your own CI infrastructure with Jenkins, DroneIO, ConcourceCI, ...

Yet another option is to move to CircleCI or Cirrus. They are similar to TravisCI / GA and less crowded (for now).

Martin

I appreciate the resources provided to us but that does not resolve the
issue of the development being slowed down.


2021년 4월 7일 (수) 오후 5:52, Greg Stein <[hidden email]>님이 작성:

> On Wed, Apr 7, 2021 at 12:25 AM Hyukjin Kwon <[hidden email]> wrote:
>
>> Hi all,
>>
>> I am an Apache Spark PMC,
>
>
> You are a member of the Apache Spark PMC. You are *not* a PMC. Please stop
> with that terminology. The Foundation has about 200 PMCs, and you are a
> member of one of them. You are NOT a "PMC" .. you're a person. A PMC is a
> construct of the Foundation.
>
> >...
>
>> I am aware of the limited GitHub Actions resources that are shared
>> across all projects in ASF,
>> and many projects suffer from it. This issue significantly slows down the
>> development cycle of
>>  other projects, at least Apache Spark.
>>
>
> And the Foundation gets those build minutes for GitHub Actions provided to
> us from GitHub and Microsoft, and we are thankful that they provide them to
> the Foundation. Maybe it isn't all the build minutes that every group
> wants, but that is what we have. So it is incumbent upon all of us to
> figure out how to build more, with fewer minutes.
>
> Say "thank you" to GitHub, please.
>
> Regards,
> -g
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

cloud0fan
> for example, having sub-groups where each group shares the resources - currently one GitHub organisation shares all resources across the projects.

That's a good idea. We do need to thank Github to give free resources to ASF projects, but it's better if we can make it a business: we allow individual projects to sign deals with Github to get dedicated resources. It's a bit wasteful to ask every project to set up its own dev ops, using Github Action is more convenient. Maybe we should raise it to Github?

On Wed, Apr 7, 2021 at 9:31 PM Hyukjin Kwon <[hidden email]> wrote:
Thanks Martin for your feedback.

> What was your reason to migrate from Apache Jenkins to Github Actions ? 

I am sure there were more reasons for migrating from Amplap Jenkins to GitHub Actions but as far as I can remember:
- To reduce the maintenance cost of machines
- The Jenkins machines became unstable and slow causing CI jobs to fail or be very flaky.
- Difficulty to manage the installed libraries.
- Intermittent unknown issues in the machines

Yes, one option might be to consider other options to migrate again. However, other projects will very likely suffer the
same problem. In addition, the migration in a large project is not an easy work to do

I would like to know the feasibility of having more resources in GitHub Actions, or, for example, having sub-groups where
each group shares the resources - currently one GitHub organisation shares all resources across the projects.


2021년 4월 7일 (수) 오후 10:04, Martin Grigorov <[hidden email]>님이 작성:


On Wed, Apr 7, 2021 at 3:41 PM Hyukjin Kwon <[hidden email]> wrote:
Hi Greg,

I raised this thread to figure out a way that we can work together to
resolve this issue, gather feedback, and to understand how other projects
work around.
Several projects I observed, as far as I can tell, have made enough efforts
to save the resources in GitHub Actions but still suffer from the lack of
resources.

And it will get even worse because:
1) more and more Apache projects migrate from TravisCI to Github Actions (GA)
2) new projects join ASF and many of them already use GA


What was your reason to migrate from Apache Jenkins to Github Actions ? 
If you want dedicated resources then you will need to manage the CI yourself. 
You could use Apache Jenkins/Buildbot with dedicated agents for your project.
Or you could set up your own CI infrastructure with Jenkins, DroneIO, ConcourceCI, ...

Yet another option is to move to CircleCI or Cirrus. They are similar to TravisCI / GA and less crowded (for now).

Martin

I appreciate the resources provided to us but that does not resolve the
issue of the development being slowed down.


2021년 4월 7일 (수) 오후 5:52, Greg Stein <[hidden email]>님이 작성:

> On Wed, Apr 7, 2021 at 12:25 AM Hyukjin Kwon <[hidden email]> wrote:
>
>> Hi all,
>>
>> I am an Apache Spark PMC,
>
>
> You are a member of the Apache Spark PMC. You are *not* a PMC. Please stop
> with that terminology. The Foundation has about 200 PMCs, and you are a
> member of one of them. You are NOT a "PMC" .. you're a person. A PMC is a
> construct of the Foundation.
>
> >...
>
>> I am aware of the limited GitHub Actions resources that are shared
>> across all projects in ASF,
>> and many projects suffer from it. This issue significantly slows down the
>> development cycle of
>>  other projects, at least Apache Spark.
>>
>
> And the Foundation gets those build minutes for GitHub Actions provided to
> us from GitHub and Microsoft, and we are thankful that they provide them to
> the Foundation. Maybe it isn't all the build minutes that every group
> wants, but that is what we have. So it is incumbent upon all of us to
> figure out how to build more, with fewer minutes.
>
> Say "thank you" to GitHub, please.
>
> Regards,
> -g
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

Hyukjin Kwon
- builds

FYI, cc'ing Spark dev was dropped during the discussion. If you haven't subscribed to [hidden email], you have seen the partial discussions only.
Please subscribe [hidden email] mailing list to participate in the discussion further.


2021년 4월 8일 (목) 오후 1:50, Wenchen Fan <[hidden email]>님이 작성:
> for example, having sub-groups where each group shares the resources - currently one GitHub organisation shares all resources across the projects.

That's a good idea. We do need to thank Github to give free resources to ASF projects, but it's better if we can make it a business: we allow individual projects to sign deals with Github to get dedicated resources. It's a bit wasteful to ask every project to set up its own dev ops, using Github Action is more convenient. Maybe we should raise it to Github?

On Wed, Apr 7, 2021 at 9:31 PM Hyukjin Kwon <[hidden email]> wrote:
Thanks Martin for your feedback.

> What was your reason to migrate from Apache Jenkins to Github Actions ? 

I am sure there were more reasons for migrating from Amplap Jenkins to GitHub Actions but as far as I can remember:
- To reduce the maintenance cost of machines
- The Jenkins machines became unstable and slow causing CI jobs to fail or be very flaky.
- Difficulty to manage the installed libraries.
- Intermittent unknown issues in the machines

Yes, one option might be to consider other options to migrate again. However, other projects will very likely suffer the
same problem. In addition, the migration in a large project is not an easy work to do

I would like to know the feasibility of having more resources in GitHub Actions, or, for example, having sub-groups where
each group shares the resources - currently one GitHub organisation shares all resources across the projects.


2021년 4월 7일 (수) 오후 10:04, Martin Grigorov <[hidden email]>님이 작성:


On Wed, Apr 7, 2021 at 3:41 PM Hyukjin Kwon <[hidden email]> wrote:
Hi Greg,

I raised this thread to figure out a way that we can work together to
resolve this issue, gather feedback, and to understand how other projects
work around.
Several projects I observed, as far as I can tell, have made enough efforts
to save the resources in GitHub Actions but still suffer from the lack of
resources.

And it will get even worse because:
1) more and more Apache projects migrate from TravisCI to Github Actions (GA)
2) new projects join ASF and many of them already use GA


What was your reason to migrate from Apache Jenkins to Github Actions ? 
If you want dedicated resources then you will need to manage the CI yourself. 
You could use Apache Jenkins/Buildbot with dedicated agents for your project.
Or you could set up your own CI infrastructure with Jenkins, DroneIO, ConcourceCI, ...

Yet another option is to move to CircleCI or Cirrus. They are similar to TravisCI / GA and less crowded (for now).

Martin

I appreciate the resources provided to us but that does not resolve the
issue of the development being slowed down.


2021년 4월 7일 (수) 오후 5:52, Greg Stein <[hidden email]>님이 작성:

> On Wed, Apr 7, 2021 at 12:25 AM Hyukjin Kwon <[hidden email]> wrote:
>
>> Hi all,
>>
>> I am an Apache Spark PMC,
>
>
> You are a member of the Apache Spark PMC. You are *not* a PMC. Please stop
> with that terminology. The Foundation has about 200 PMCs, and you are a
> member of one of them. You are NOT a "PMC" .. you're a person. A PMC is a
> construct of the Foundation.
>
> >...
>
>> I am aware of the limited GitHub Actions resources that are shared
>> across all projects in ASF,
>> and many projects suffer from it. This issue significantly slows down the
>> development cycle of
>>  other projects, at least Apache Spark.
>>
>
> And the Foundation gets those build minutes for GitHub Actions provided to
> us from GitHub and Microsoft, and we are thankful that they provide them to
> the Foundation. Maybe it isn't all the build minutes that every group
> wants, but that is what we have. So it is incumbent upon all of us to
> figure out how to build more, with fewer minutes.
>
> Say "thank you" to GitHub, please.
>
> Regards,
> -g
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

shane knapp ☠
In reply to this post by Hyukjin Kwon


On Wed, Apr 7, 2021 at 6:30 AM Hyukjin Kwon <[hidden email]> wrote:
Thanks Martin for your feedback.

> What was your reason to migrate from Apache Jenkins to Github Actions ? 

I am sure there were more reasons for migrating from Amplap Jenkins to GitHub Actions but as far as I can remember:
- To reduce the maintenance cost of machines
- The Jenkins machines became unstable and slow causing CI jobs to fail or be very flaky.
- Difficulty to manage the installed libraries.
- Intermittent unknown issues in the machines

also:

- uc berkeley has been hosting the build system for spark for ~10 years "free of charge"
- funding for the build system is going away (amplab funded first, riselab second)
- i have been managing the build system solo for 7 years and my job is much different now...
- since there are no funds coming from research labs, i am unable to staff the build system past 2021 (tbh, even this year is a stretch)
- the hardware is far past EOL and literally falling over
- jenkins is, and always will be a PITA to run

shane
--
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu
Reply | Threaded
Open this post in threaded view
|

Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

Jarek Potiuk
In reply to this post by cloud0fan

That's a good idea. We do need to thank Github to give free resources to
ASF projects, but it's better if we can make it a business: we allow
individual projects to sign deals with Github to get dedicated resources.
It's a bit wasteful to ask every project to set up its own dev ops,
using Github Action is more convenient. Maybe we should raise it to Github?

I do not think you can get per-project resources in GH - the most you can do are self-hosted runners for your project.

(BTW I am not from the INFRA team - just a humble "CI person" of Apache Airflow but very much vested into Github Actions)
maybe the infra team can chime in here. We did raise it to GitHub, we even had meeting with them
organized by Gavin and several topics were raised that could be eventually addressed by Github:

- observability (they could not give us per-project usage dashboard - we built our own imperfect (with API limitations) one by Tobiasz from Airllow
- security (limiting access to only project committers) - this we handled by the Ash's fork of Runner (but it's also imperfect - even today I had to fix a problem where we had list of committers desynchronised between our infra/CI.yml)
- manageability (assigning resources per-project) - this works by having self-hosted runners assigned per project (we needed infra JIRA ticket and generation of a bunch of tokens for our runners and our own AWS account with auto-scaling).

It would be indeed great if it could be available from GitHub, but so far we do not have any of those.

J.

 
On Wed, Apr 7, 2021 at 9:31 PM Hyukjin Kwon <[hidden email]> wrote:

> Thanks Martin for your feedback.
>
> > What was your reason to migrate from Apache Jenkins to Github Actions ?
>
> I am sure there were more reasons for migrating from Amplap Jenkins
> <https://amplab.cs.berkeley.edu/jenkins/> to GitHub Actions but as far as
> I can remember:
> - To reduce the maintenance cost of machines
> - The Jenkins machines became unstable and slow causing CI jobs to fail or
> be very flaky.
> - Difficulty to manage the installed libraries.
> - Intermittent unknown issues in the machines
>
> Yes, one option might be to consider other options to migrate again.
> However, other projects will very likely suffer the
> same problem. In addition, the migration in a large project is not an
> easy work to do
>
> I would like to know the feasibility of having more resources in GitHub
> Actions, or, for example, having sub-groups where
> each group shares the resources - currently one GitHub organisation shares
> all resources across the projects.
>
>
> 2021년 4월 7일 (수) 오후 10:04, Martin Grigorov <[hidden email]>님이 작성:
>
>>
>>
>> On Wed, Apr 7, 2021 at 3:41 PM Hyukjin Kwon <[hidden email]> wrote:
>>
>>> Hi Greg,
>>>
>>> I raised this thread to figure out a way that we can work together to
>>> resolve this issue, gather feedback, and to understand how other projects
>>> work around.
>>> Several projects I observed, as far as I can tell, have made enough
>>> efforts
>>> to save the resources in GitHub Actions but still suffer from the lack of
>>> resources.
>>>
>>
>> And it will get even worse because:
>> 1) more and more Apache projects migrate from TravisCI to Github Actions
>> (GA)
>> 2) new projects join ASF and many of them already use GA
>>
>>
>> What was your reason to migrate from Apache Jenkins to Github Actions ?
>> If you want dedicated resources then you will need to manage the CI
>> yourself.
>> You could use Apache Jenkins/Buildbot with dedicated agents for your
>> project.
>> Or you could set up your own CI infrastructure with Jenkins, DroneIO,
>> ConcourceCI, ...
>>
>> Yet another option is to move to CircleCI or Cirrus. They are similar to
>> TravisCI / GA and less crowded (for now).
>>
>> Martin
>>
>> I appreciate the resources provided to us but that does not resolve the
>>> issue of the development being slowed down.
>>>
>>>
>>> 2021년 4월 7일 (수) 오후 5:52, Greg Stein <[hidden email]>님이 작성:
>>>
>>> > On Wed, Apr 7, 2021 at 12:25 AM Hyukjin Kwon <[hidden email]>
>>> wrote:
>>> >
>>> >> Hi all,
>>> >>
>>> >> I am an Apache Spark PMC,
>>> >
>>> >
>>> > You are a member of the Apache Spark PMC. You are *not* a PMC. Please
>>> stop
>>> > with that terminology. The Foundation has about 200 PMCs, and you are a
>>> > member of one of them. You are NOT a "PMC" .. you're a person. A PMC
>>> is a
>>> > construct of the Foundation.
>>> >
>>> > >...
>>> >
>>> >> I am aware of the limited GitHub Actions resources that are shared
>>> >> across all projects in ASF,
>>> >> and many projects suffer from it. This issue significantly slows down
>>> the
>>> >> development cycle of
>>> >>  other projects, at least Apache Spark.
>>> >>
>>> >
>>> > And the Foundation gets those build minutes for GitHub Actions
>>> provided to
>>> > us from GitHub and Microsoft, and we are thankful that they provide
>>> them to
>>> > the Foundation. Maybe it isn't all the build minutes that every group
>>> > wants, but that is what we have. So it is incumbent upon all of us to
>>> > figure out how to build more, with fewer minutes.
>>> >
>>> > Say "thank you" to GitHub, please.
>>> >
>>> > Regards,
>>> > -g
>>> >
>>> >
>>>
>>


--
+48 660 796 129