Jenkins upgrade/Test Parallelization & Containerization

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Jenkins upgrade/Test Parallelization & Containerization

xin lu
Hi everyone,

I tried sending emails to this list and I'm not sure if it went through so I'm trying again.  Anyway, a couple months ago before I left Databricks I was working on a proof of concept that parallelized Spark tests on jenkins.  The way it worked was basically it build the spark jars and then ran all the tests in a docker container on a bunch of slaves in parallel.  This cut the testing time down from 4 hours to approximately 1.5 hours.  This required a newer version of jenkins and the Jenkins Pipeline plugin.  I am wondering if it is possible to do this on amplab jenkins.  It looks like https://builds.apache.org/ has upgraded so Amplabs jenkins is a year or so behind.  I am happy to help with this project if it is something that people think is worthwhile.  

Thanks

Xin
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins upgrade/Test Parallelization & Containerization

Holden Karau
Just wanting to +1 this idea. One potential option would be to look at migrating away from the AMP Lab Jenkins infra into the ASF infra. I've added Josh, Shane, and Sean to the CC line explicitly since I think they might have opinions about this.


On Tue, Oct 31, 2017 at 11:05 PM, Xin Lu <[hidden email]> wrote:
Hi everyone,

I tried sending emails to this list and I'm not sure if it went through so I'm trying again.  Anyway, a couple months ago before I left Databricks I was working on a proof of concept that parallelized Spark tests on jenkins.  The way it worked was basically it build the spark jars and then ran all the tests in a docker container on a bunch of slaves in parallel.  This cut the testing time down from 4 hours to approximately 1.5 hours.  This required a newer version of jenkins and the Jenkins Pipeline plugin.  I am wondering if it is possible to do this on amplab jenkins.  It looks like https://builds.apache.org/ has upgraded so Amplabs jenkins is a year or so behind.  I am happy to help with this project if it is something that people think is worthwhile.  

Thanks

Xin



--
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins upgrade/Test Parallelization & Containerization

Holden Karau
Oh and to be clear part of my +1 is dockerized Spark builds would simplify a lot of headaches we face trying to coordinate changes on the PySpark side (it's not just oooh shiny faster build times, although that's pretty compelling in its self).

On Tue, Nov 7, 2017 at 11:20 AM, Holden Karau <[hidden email]> wrote:
Just wanting to +1 this idea. One potential option would be to look at migrating away from the AMP Lab Jenkins infra into the ASF infra. I've added Josh, Shane, and Sean to the CC line explicitly since I think they might have opinions about this.


On Tue, Oct 31, 2017 at 11:05 PM, Xin Lu <[hidden email]> wrote:
Hi everyone,

I tried sending emails to this list and I'm not sure if it went through so I'm trying again.  Anyway, a couple months ago before I left Databricks I was working on a proof of concept that parallelized Spark tests on jenkins.  The way it worked was basically it build the spark jars and then ran all the tests in a docker container on a bunch of slaves in parallel.  This cut the testing time down from 4 hours to approximately 1.5 hours.  This required a newer version of jenkins and the Jenkins Pipeline plugin.  I am wondering if it is possible to do this on amplab jenkins.  It looks like https://builds.apache.org/ has upgraded so Amplabs jenkins is a year or so behind.  I am happy to help with this project if it is something that people think is worthwhile.  

Thanks

Xin



--



--
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins upgrade/Test Parallelization & Containerization

Sean Owen
In reply to this post by xin lu
Faster tests would be great. I recall that the straightforward ways to parallelize via Maven haven't worked because many tests collide with one another. Is this about running each module's tests in a container? that should work.

I can see how this is becoming essential for repeatable and reliable Python/R builds, which depend on the environment to a much greater extent than the JVM does.

I don't have a strong preference for AMPLab vs ASF builds. I suppose using the ASF machinery is a little tidier. If it's got a later Jenkins that's required, also a plus, but I assume updating AMPLab isn't so hard here either. I think the key issue is which environment is easier to control and customize over time.

On Wed, Nov 1, 2017 at 6:05 AM Xin Lu <[hidden email]> wrote:
Hi everyone,

I tried sending emails to this list and I'm not sure if it went through so I'm trying again.  Anyway, a couple months ago before I left Databricks I was working on a proof of concept that parallelized Spark tests on jenkins.  The way it worked was basically it build the spark jars and then ran all the tests in a docker container on a bunch of slaves in parallel.  This cut the testing time down from 4 hours to approximately 1.5 hours.  This required a newer version of jenkins and the Jenkins Pipeline plugin.  I am wondering if it is possible to do this on amplab jenkins.  It looks like https://builds.apache.org/ has upgraded so Amplabs jenkins is a year or so behind.  I am happy to help with this project if it is something that people think is worthwhile.  

Thanks

Xin
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins upgrade/Test Parallelization & Containerization

Holden Karau
True, I think we've seen that the Amp Lab Jenkins needs to be more focused on running AMP Lab projects, and while I don't know how difficult the ASF Jenkins is I assume it might be an easier place to make changes going forward? (Of course this could be the grass is greener on the other side and I don't mean to say it's been hard to make changes on the AMP lab hardware, folks have been amazingly helpful - its just the projects on each have different needs).

On Tue, Nov 7, 2017 at 12:52 PM, Sean Owen <[hidden email]> wrote:
Faster tests would be great. I recall that the straightforward ways to parallelize via Maven haven't worked because many tests collide with one another. Is this about running each module's tests in a container? that should work.

I can see how this is becoming essential for repeatable and reliable Python/R builds, which depend on the environment to a much greater extent than the JVM does.

I don't have a strong preference for AMPLab vs ASF builds. I suppose using the ASF machinery is a little tidier. If it's got a later Jenkins that's required, also a plus, but I assume updating AMPLab isn't so hard here either. I think the key issue is which environment is easier to control and customize over time.


On Wed, Nov 1, 2017 at 6:05 AM Xin Lu <[hidden email]> wrote:
Hi everyone,

I tried sending emails to this list and I'm not sure if it went through so I'm trying again.  Anyway, a couple months ago before I left Databricks I was working on a proof of concept that parallelized Spark tests on jenkins.  The way it worked was basically it build the spark jars and then ran all the tests in a docker container on a bunch of slaves in parallel.  This cut the testing time down from 4 hours to approximately 1.5 hours.  This required a newer version of jenkins and the Jenkins Pipeline plugin.  I am wondering if it is possible to do this on amplab jenkins.  It looks like https://builds.apache.org/ has upgraded so Amplabs jenkins is a year or so behind.  I am happy to help with this project if it is something that people think is worthwhile.  

Thanks

Xin



--
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins upgrade/Test Parallelization & Containerization

rxin
My understanding is that AMP actually can provide more resources or adapt changes, while ASF needs to manage 200+ projects and it's hard to accommodate much. I could be wrong though.


On Tue, Nov 7, 2017 at 2:14 PM, Holden Karau <[hidden email]> wrote:
True, I think we've seen that the Amp Lab Jenkins needs to be more focused on running AMP Lab projects, and while I don't know how difficult the ASF Jenkins is I assume it might be an easier place to make changes going forward? (Of course this could be the grass is greener on the other side and I don't mean to say it's been hard to make changes on the AMP lab hardware, folks have been amazingly helpful - its just the projects on each have different needs).

On Tue, Nov 7, 2017 at 12:52 PM, Sean Owen <[hidden email]> wrote:
Faster tests would be great. I recall that the straightforward ways to parallelize via Maven haven't worked because many tests collide with one another. Is this about running each module's tests in a container? that should work.

I can see how this is becoming essential for repeatable and reliable Python/R builds, which depend on the environment to a much greater extent than the JVM does.

I don't have a strong preference for AMPLab vs ASF builds. I suppose using the ASF machinery is a little tidier. If it's got a later Jenkins that's required, also a plus, but I assume updating AMPLab isn't so hard here either. I think the key issue is which environment is easier to control and customize over time.


On Wed, Nov 1, 2017 at 6:05 AM Xin Lu <[hidden email]> wrote:
Hi everyone,

I tried sending emails to this list and I'm not sure if it went through so I'm trying again.  Anyway, a couple months ago before I left Databricks I was working on a proof of concept that parallelized Spark tests on jenkins.  The way it worked was basically it build the spark jars and then ran all the tests in a docker container on a bunch of slaves in parallel.  This cut the testing time down from 4 hours to approximately 1.5 hours.  This required a newer version of jenkins and the Jenkins Pipeline plugin.  I am wondering if it is possible to do this on amplab jenkins.  It looks like https://builds.apache.org/ has upgraded so Amplabs jenkins is a year or so behind.  I am happy to help with this project if it is something that people think is worthwhile.  

Thanks

Xin



--

Reply | Threaded
Open this post in threaded view
|

Re: Jenkins upgrade/Test Parallelization & Containerization

Holden Karau
That makes sense, in that case do we know how hard it would be to make the necessary hands to the AMP Lab Jenkins to support this?

On Tue, Nov 7, 2017 at 4:04 PM Reynold Xin <[hidden email]> wrote:
My understanding is that AMP actually can provide more resources or adapt changes, while ASF needs to manage 200+ projects and it's hard to accommodate much. I could be wrong though.


On Tue, Nov 7, 2017 at 2:14 PM, Holden Karau <[hidden email]> wrote:
True, I think we've seen that the Amp Lab Jenkins needs to be more focused on running AMP Lab projects, and while I don't know how difficult the ASF Jenkins is I assume it might be an easier place to make changes going forward? (Of course this could be the grass is greener on the other side and I don't mean to say it's been hard to make changes on the AMP lab hardware, folks have been amazingly helpful - its just the projects on each have different needs).

On Tue, Nov 7, 2017 at 12:52 PM, Sean Owen <[hidden email]> wrote:
Faster tests would be great. I recall that the straightforward ways to parallelize via Maven haven't worked because many tests collide with one another. Is this about running each module's tests in a container? that should work.

I can see how this is becoming essential for repeatable and reliable Python/R builds, which depend on the environment to a much greater extent than the JVM does.

I don't have a strong preference for AMPLab vs ASF builds. I suppose using the ASF machinery is a little tidier. If it's got a later Jenkins that's required, also a plus, but I assume updating AMPLab isn't so hard here either. I think the key issue is which environment is easier to control and customize over time.


On Wed, Nov 1, 2017 at 6:05 AM Xin Lu <[hidden email]> wrote:
Hi everyone,

I tried sending emails to this list and I'm not sure if it went through so I'm trying again.  Anyway, a couple months ago before I left Databricks I was working on a proof of concept that parallelized Spark tests on jenkins.  The way it worked was basically it build the spark jars and then ran all the tests in a docker container on a bunch of slaves in parallel.  This cut the testing time down from 4 hours to approximately 1.5 hours.  This required a newer version of jenkins and the Jenkins Pipeline plugin.  I am wondering if it is possible to do this on amplab jenkins.  It looks like https://builds.apache.org/ has upgraded so Amplabs jenkins is a year or so behind.  I am happy to help with this project if it is something that people think is worthwhile.  

Thanks

Xin



--

--
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins upgrade/Test Parallelization & Containerization

Felix Cheung
In reply to this post by Holden Karau
We actually have some immediate needs for custom config for some upcoming integration tests.

I don’t know if such changes are possible in ASF Jenkins but the work is in progress in RISELab Jenkins :)



From: [hidden email] <[hidden email]> on behalf of Holden Karau <[hidden email]>
Sent: Tuesday, November 7, 2017 2:14:18 PM
To: Sean Owen
Cc: Xin Lu; [hidden email]
Subject: Re: Jenkins upgrade/Test Parallelization & Containerization
 
True, I think we've seen that the Amp Lab Jenkins needs to be more focused on running AMP Lab projects, and while I don't know how difficult the ASF Jenkins is I assume it might be an easier place to make changes going forward? (Of course this could be the grass is greener on the other side and I don't mean to say it's been hard to make changes on the AMP lab hardware, folks have been amazingly helpful - its just the projects on each have different needs).

On Tue, Nov 7, 2017 at 12:52 PM, Sean Owen <[hidden email]> wrote:
Faster tests would be great. I recall that the straightforward ways to parallelize via Maven haven't worked because many tests collide with one another. Is this about running each module's tests in a container? that should work.

I can see how this is becoming essential for repeatable and reliable Python/R builds, which depend on the environment to a much greater extent than the JVM does.

I don't have a strong preference for AMPLab vs ASF builds. I suppose using the ASF machinery is a little tidier. If it's got a later Jenkins that's required, also a plus, but I assume updating AMPLab isn't so hard here either. I think the key issue is which environment is easier to control and customize over time.


On Wed, Nov 1, 2017 at 6:05 AM Xin Lu <[hidden email]> wrote:
Hi everyone,

I tried sending emails to this list and I'm not sure if it went through so I'm trying again.  Anyway, a couple months ago before I left Databricks I was working on a proof of concept that parallelized Spark tests on jenkins.  The way it worked was basically it build the spark jars and then ran all the tests in a docker container on a bunch of slaves in parallel.  This cut the testing time down from 4 hours to approximately 1.5 hours.  This required a newer version of jenkins and the Jenkins Pipeline plugin.  I am wondering if it is possible to do this on amplab jenkins.  It looks like https://builds.apache.org/ has upgraded so Amplabs jenkins is a year or so behind.  I am happy to help with this project if it is something that people think is worthwhile.  

Thanks

Xin



--
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins upgrade/Test Parallelization & Containerization

Sean Owen
On a related note, I believe Hyukjin has been requesting access to trigger tests on amplab Jenkins for some time. Would be great to add him. 

On Wed, Nov 8, 2017, 4:25 AM Felix Cheung <[hidden email]> wrote:
We actually have some immediate needs for custom config for some upcoming integration tests.

I don’t know if such changes are possible in ASF Jenkins but the work is in progress in RISELab Jenkins :)



From: [hidden email] <[hidden email]> on behalf of Holden Karau <[hidden email]>
Sent: Tuesday, November 7, 2017 2:14:18 PM
To: Sean Owen
Cc: Xin Lu; [hidden email]
Subject: Re: Jenkins upgrade/Test Parallelization & Containerization
 
True, I think we've seen that the Amp Lab Jenkins needs to be more focused on running AMP Lab projects, and while I don't know how difficult the ASF Jenkins is I assume it might be an easier place to make changes going forward? (Of course this could be the grass is greener on the other side and I don't mean to say it's been hard to make changes on the AMP lab hardware, folks have been amazingly helpful - its just the projects on each have different needs).

On Tue, Nov 7, 2017 at 12:52 PM, Sean Owen <[hidden email]> wrote:
Faster tests would be great. I recall that the straightforward ways to parallelize via Maven haven't worked because many tests collide with one another. Is this about running each module's tests in a container? that should work.

I can see how this is becoming essential for repeatable and reliable Python/R builds, which depend on the environment to a much greater extent than the JVM does.

I don't have a strong preference for AMPLab vs ASF builds. I suppose using the ASF machinery is a little tidier. If it's got a later Jenkins that's required, also a plus, but I assume updating AMPLab isn't so hard here either. I think the key issue is which environment is easier to control and customize over time.


On Wed, Nov 1, 2017 at 6:05 AM Xin Lu <[hidden email]> wrote:
Hi everyone,

I tried sending emails to this list and I'm not sure if it went through so I'm trying again.  Anyway, a couple months ago before I left Databricks I was working on a proof of concept that parallelized Spark tests on jenkins.  The way it worked was basically it build the spark jars and then ran all the tests in a docker container on a bunch of slaves in parallel.  This cut the testing time down from 4 hours to approximately 1.5 hours.  This required a newer version of jenkins and the Jenkins Pipeline plugin.  I am wondering if it is possible to do this on amplab jenkins.  It looks like https://builds.apache.org/ has upgraded so Amplabs jenkins is a year or so behind.  I am happy to help with this project if it is something that people think is worthwhile.  

Thanks

Xin



--
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins upgrade/Test Parallelization & Containerization

xin lu
 Yeah Sean so the setup I had didn't really care about parallelizing in Maven.  It just stashed the built artifacts and moved them onto the slaves running tests and tests for each submodule ran in a separate docker container.  After each subbuild was done the build logs were transferred back and aggregated.  I actually don't know much about the permissions and capabilities of the ASF Jenkins and how projects get on there but as long as we can use amplab jenkins the same way with docker containers then I think it is totally fine to stay there.  

Xin

On Wed, Nov 8, 2017 at 1:57 AM, Sean Owen <[hidden email]> wrote:
On a related note, I believe Hyukjin has been requesting access to trigger tests on amplab Jenkins for some time. Would be great to add him. 


On Wed, Nov 8, 2017, 4:25 AM Felix Cheung <[hidden email]> wrote:
We actually have some immediate needs for custom config for some upcoming integration tests.

I don’t know if such changes are possible in ASF Jenkins but the work is in progress in RISELab Jenkins :)



From: [hidden email] <[hidden email]> on behalf of Holden Karau <[hidden email]>
Sent: Tuesday, November 7, 2017 2:14:18 PM
To: Sean Owen
Cc: Xin Lu; [hidden email]
Subject: Re: Jenkins upgrade/Test Parallelization & Containerization
 
True, I think we've seen that the Amp Lab Jenkins needs to be more focused on running AMP Lab projects, and while I don't know how difficult the ASF Jenkins is I assume it might be an easier place to make changes going forward? (Of course this could be the grass is greener on the other side and I don't mean to say it's been hard to make changes on the AMP lab hardware, folks have been amazingly helpful - its just the projects on each have different needs).

On Tue, Nov 7, 2017 at 12:52 PM, Sean Owen <[hidden email]> wrote:
Faster tests would be great. I recall that the straightforward ways to parallelize via Maven haven't worked because many tests collide with one another. Is this about running each module's tests in a container? that should work.

I can see how this is becoming essential for repeatable and reliable Python/R builds, which depend on the environment to a much greater extent than the JVM does.

I don't have a strong preference for AMPLab vs ASF builds. I suppose using the ASF machinery is a little tidier. If it's got a later Jenkins that's required, also a plus, but I assume updating AMPLab isn't so hard here either. I think the key issue is which environment is easier to control and customize over time.


On Wed, Nov 1, 2017 at 6:05 AM Xin Lu <[hidden email]> wrote:
Hi everyone,

I tried sending emails to this list and I'm not sure if it went through so I'm trying again.  Anyway, a couple months ago before I left Databricks I was working on a proof of concept that parallelized Spark tests on jenkins.  The way it worked was basically it build the spark jars and then ran all the tests in a docker container on a bunch of slaves in parallel.  This cut the testing time down from 4 hours to approximately 1.5 hours.  This required a newer version of jenkins and the Jenkins Pipeline plugin.  I am wondering if it is possible to do this on amplab jenkins.  It looks like https://builds.apache.org/ has upgraded so Amplabs jenkins is a year or so behind.  I am happy to help with this project if it is something that people think is worthwhile.  

Thanks

Xin



--

Reply | Threaded
Open this post in threaded view
|

Re: Jenkins upgrade/Test Parallelization & Containerization

shane knapp
hey all, i'm finally back from vacation this week and will be following up once i whittle down my inbox.

in summation:  jenkins worker upgrades will be happening.  the biggest one is the move to ubuntu...  we need containerized builds for this, but i don't have the cycles to really do all of this on my own.

also, i'm planning on moving to a newer jenkins asap, as well as a reinstall of the master w/ubuntu.  this will absolutely disrupt builds, but i have some pretty good ideas on how to mitigate downtime.

anyways, give me a couple of days to get caught up...  2 weeks w/o any internet was amazing, but my inbox is suffering from it.  ;)

shane

On Fri, Nov 10, 2017 at 12:08 AM, Xin Lu <[hidden email]> wrote:
 Yeah Sean so the setup I had didn't really care about parallelizing in Maven.  It just stashed the built artifacts and moved them onto the slaves running tests and tests for each submodule ran in a separate docker container.  After each subbuild was done the build logs were transferred back and aggregated.  I actually don't know much about the permissions and capabilities of the ASF Jenkins and how projects get on there but as long as we can use amplab jenkins the same way with docker containers then I think it is totally fine to stay there.  

Xin

On Wed, Nov 8, 2017 at 1:57 AM, Sean Owen <[hidden email]> wrote:
On a related note, I believe Hyukjin has been requesting access to trigger tests on amplab Jenkins for some time. Would be great to add him. 


On Wed, Nov 8, 2017, 4:25 AM Felix Cheung <[hidden email]> wrote:
We actually have some immediate needs for custom config for some upcoming integration tests.

I don’t know if such changes are possible in ASF Jenkins but the work is in progress in RISELab Jenkins :)



From: [hidden email] <[hidden email]> on behalf of Holden Karau <[hidden email]>
Sent: Tuesday, November 7, 2017 2:14:18 PM
To: Sean Owen
Cc: Xin Lu; [hidden email]
Subject: Re: Jenkins upgrade/Test Parallelization & Containerization
 
True, I think we've seen that the Amp Lab Jenkins needs to be more focused on running AMP Lab projects, and while I don't know how difficult the ASF Jenkins is I assume it might be an easier place to make changes going forward? (Of course this could be the grass is greener on the other side and I don't mean to say it's been hard to make changes on the AMP lab hardware, folks have been amazingly helpful - its just the projects on each have different needs).

On Tue, Nov 7, 2017 at 12:52 PM, Sean Owen <[hidden email]> wrote:
Faster tests would be great. I recall that the straightforward ways to parallelize via Maven haven't worked because many tests collide with one another. Is this about running each module's tests in a container? that should work.

I can see how this is becoming essential for repeatable and reliable Python/R builds, which depend on the environment to a much greater extent than the JVM does.

I don't have a strong preference for AMPLab vs ASF builds. I suppose using the ASF machinery is a little tidier. If it's got a later Jenkins that's required, also a plus, but I assume updating AMPLab isn't so hard here either. I think the key issue is which environment is easier to control and customize over time.


On Wed, Nov 1, 2017 at 6:05 AM Xin Lu <[hidden email]> wrote:
Hi everyone,

I tried sending emails to this list and I'm not sure if it went through so I'm trying again.  Anyway, a couple months ago before I left Databricks I was working on a proof of concept that parallelized Spark tests on jenkins.  The way it worked was basically it build the spark jars and then ran all the tests in a docker container on a bunch of slaves in parallel.  This cut the testing time down from 4 hours to approximately 1.5 hours.  This required a newer version of jenkins and the Jenkins Pipeline plugin.  I am wondering if it is possible to do this on amplab jenkins.  It looks like https://builds.apache.org/ has upgraded so Amplabs jenkins is a year or so behind.  I am happy to help with this project if it is something that people think is worthwhile.  

Thanks

Xin



--