Spark build is failing in amplab Jenkins

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark build is failing in amplab Jenkins

Pralabh Kumar
Hi Dev

Spark build is failing in Jenkins



Python versions prior to 2.7 are not supported.
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error?

Please help


Regards
Pralabh Kumar
Reply | Threaded
Open this post in threaded view
|

Re: Spark build is failing in amplab Jenkins

Sean Owen
Agree, seeing this somewhat regularly on the pull request builder. Do some machines inadvertently have Python 2.6? some builds succeed, so may just be one or a few. CC Shane.

On Thu, Nov 2, 2017 at 5:39 PM Pralabh Kumar <[hidden email]> wrote:
Hi Dev

Spark build is failing in Jenkins



Python versions prior to 2.7 are not supported.
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error?

Please help


Regards
Pralabh Kumar
Reply | Threaded
Open this post in threaded view
|

Re: Spark build is failing in amplab Jenkins

Hyukjin Kwon
I assume it is as it says:

Python versions prior to 2.7 are not supported.

Looks this happens in worker 2, 6 and 7 given my observation.


On 4 Nov 2017 5:15 pm, "Sean Owen" <[hidden email]> wrote:
Agree, seeing this somewhat regularly on the pull request builder. Do some machines inadvertently have Python 2.6? some builds succeed, so may just be one or a few. CC Shane.


On Thu, Nov 2, 2017 at 5:39 PM Pralabh Kumar <[hidden email]> wrote:
Hi Dev

Spark build is failing in Jenkins



Python versions prior to 2.7 are not supported.
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error?

Please help


Regards
Pralabh Kumar

Reply | Threaded
Open this post in threaded view
|

Re: Spark build is failing in amplab Jenkins

xin lu
It has happened with other workers as well, namely 3 and 4 and then recovered. Looking at the build history it looks like a project called mango has been added to this pool of machines recently:


It looks like the slaves start to fail spark pull request builds after some runs of mango.  

Xin

On Sat, Nov 4, 2017 at 1:23 AM, Hyukjin Kwon <[hidden email]> wrote:
I assume it is as it says:

Python versions prior to 2.7 are not supported.

Looks this happens in worker 2, 6 and 7 given my observation.


On 4 Nov 2017 5:15 pm, "Sean Owen" <[hidden email]> wrote:
Agree, seeing this somewhat regularly on the pull request builder. Do some machines inadvertently have Python 2.6? some builds succeed, so may just be one or a few. CC Shane.


On Thu, Nov 2, 2017 at 5:39 PM Pralabh Kumar <[hidden email]> wrote:
Hi Dev

Spark build is failing in Jenkins



Python versions prior to 2.7 are not supported.
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error?

Please help


Regards
Pralabh Kumar


Reply | Threaded
Open this post in threaded view
|

Re: Spark build is failing in amplab Jenkins

xin lu
Sorry, mango wasn't added recently, but it looks like after successful builds of this specific configuration the workers break:


And then after another configuration runs it recovers.

Xin

On Sat, Nov 4, 2017 at 9:09 PM, Xin Lu <[hidden email]> wrote:
It has happened with other workers as well, namely 3 and 4 and then recovered. Looking at the build history it looks like a project called mango has been added to this pool of machines recently:


It looks like the slaves start to fail spark pull request builds after some runs of mango.  

Xin

On Sat, Nov 4, 2017 at 1:23 AM, Hyukjin Kwon <[hidden email]> wrote:
I assume it is as it says:

Python versions prior to 2.7 are not supported.

Looks this happens in worker 2, 6 and 7 given my observation.


On 4 Nov 2017 5:15 pm, "Sean Owen" <[hidden email]> wrote:
Agree, seeing this somewhat regularly on the pull request builder. Do some machines inadvertently have Python 2.6? some builds succeed, so may just be one or a few. CC Shane.


On Thu, Nov 2, 2017 at 5:39 PM Pralabh Kumar <[hidden email]> wrote:
Hi Dev

Spark build is failing in Jenkins



Python versions prior to 2.7 are not supported.
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error?

Please help


Regards
Pralabh Kumar



Reply | Threaded
Open this post in threaded view
|

Re: Spark build is failing in amplab Jenkins

Frank Austin Nothaft
Hi folks,

Alyssa (cc’ed) and I manage the mango build on the AMPLab Jenkins. I will start to look into this to see what the connection between the mango builds and the failing Spark builds are.

Regards,

Frank Austin Nothaft
202-340-0466

On Nov 4, 2017, at 9:15 PM, Xin Lu <[hidden email]> wrote:

Sorry, mango wasn't added recently, but it looks like after successful builds of this specific configuration the workers break:


And then after another configuration runs it recovers.

Xin

On Sat, Nov 4, 2017 at 9:09 PM, Xin Lu <[hidden email]> wrote:
It has happened with other workers as well, namely 3 and 4 and then recovered. Looking at the build history it looks like a project called mango has been added to this pool of machines recently:


It looks like the slaves start to fail spark pull request builds after some runs of mango.  

Xin

On Sat, Nov 4, 2017 at 1:23 AM, Hyukjin Kwon <[hidden email]> wrote:
I assume it is as it says:

Python versions prior to 2.7 are not supported.

Looks this happens in worker 2, 6 and 7 given my observation.


On 4 Nov 2017 5:15 pm, "Sean Owen" <[hidden email]> wrote:
Agree, seeing this somewhat regularly on the pull request builder. Do some machines inadvertently have Python 2.6? some builds succeed, so may just be one or a few. CC Shane.


On Thu, Nov 2, 2017 at 5:39 PM Pralabh Kumar <[hidden email]> wrote:
Hi Dev

Spark build is failing in Jenkins



Python versions prior to 2.7 are not supported.
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error?

Please help


Regards
Pralabh Kumar




Reply | Threaded
Open this post in threaded view
|

Re: Spark build is failing in amplab Jenkins

xin lu
I'm not entirely sure if it's the cause because I can't see the build configurations, but just looking at the build logs it looks like they share a pool and those mango builds run some setup with python.  

On Sat, Nov 4, 2017 at 9:19 PM, Frank Austin Nothaft <[hidden email]> wrote:
Hi folks,

Alyssa (cc’ed) and I manage the mango build on the AMPLab Jenkins. I will start to look into this to see what the connection between the mango builds and the failing Spark builds are.

Regards,

Frank Austin Nothaft
<a href="tel:(202)%20340-0466" value="+12023400466" target="_blank">202-340-0466

On Nov 4, 2017, at 9:15 PM, Xin Lu <[hidden email]> wrote:

Sorry, mango wasn't added recently, but it looks like after successful builds of this specific configuration the workers break:


And then after another configuration runs it recovers.

Xin

On Sat, Nov 4, 2017 at 9:09 PM, Xin Lu <[hidden email]> wrote:
It has happened with other workers as well, namely 3 and 4 and then recovered. Looking at the build history it looks like a project called mango has been added to this pool of machines recently:


It looks like the slaves start to fail spark pull request builds after some runs of mango.  

Xin

On Sat, Nov 4, 2017 at 1:23 AM, Hyukjin Kwon <[hidden email]> wrote:
I assume it is as it says:

Python versions prior to 2.7 are not supported.

Looks this happens in worker 2, 6 and 7 given my observation.


On 4 Nov 2017 5:15 pm, "Sean Owen" <[hidden email]> wrote:
Agree, seeing this somewhat regularly on the pull request builder. Do some machines inadvertently have Python 2.6? some builds succeed, so may just be one or a few. CC Shane.


On Thu, Nov 2, 2017 at 5:39 PM Pralabh Kumar <[hidden email]> wrote:
Hi Dev

Spark build is failing in Jenkins



Python versions prior to 2.7 are not supported.
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error?

Please help


Regards
Pralabh Kumar





Reply | Threaded
Open this post in threaded view
|

Re: Spark build is failing in amplab Jenkins

Frank Austin Nothaft
Hi Xin!

Mango does install python dependencies, but they should all be inside of a conda environment. My guess is that we've got somewhere in the mango Jenkins build where something is getting installed outside of the conda environment. I'll be looking into this shortly.

Regards,

Frank Austin Nothaft

On Nov 4, 2017, at 9:25 PM, Xin Lu <[hidden email]> wrote:

I'm not entirely sure if it's the cause because I can't see the build configurations, but just looking at the build logs it looks like they share a pool and those mango builds run some setup with python.  

On Sat, Nov 4, 2017 at 9:19 PM, Frank Austin Nothaft <[hidden email]> wrote:
Hi folks,

Alyssa (cc’ed) and I manage the mango build on the AMPLab Jenkins. I will start to look into this to see what the connection between the mango builds and the failing Spark builds are.

Regards,

Frank Austin Nothaft
<a href="tel:(202)%20340-0466" value="+12023400466" target="_blank">202-340-0466

On Nov 4, 2017, at 9:15 PM, Xin Lu <[hidden email]> wrote:

Sorry, mango wasn't added recently, but it looks like after successful builds of this specific configuration the workers break:


And then after another configuration runs it recovers.

Xin

On Sat, Nov 4, 2017 at 9:09 PM, Xin Lu <[hidden email]> wrote:
It has happened with other workers as well, namely 3 and 4 and then recovered. Looking at the build history it looks like a project called mango has been added to this pool of machines recently:


It looks like the slaves start to fail spark pull request builds after some runs of mango.  

Xin

On Sat, Nov 4, 2017 at 1:23 AM, Hyukjin Kwon <[hidden email]> wrote:
I assume it is as it says:

Python versions prior to 2.7 are not supported.

Looks this happens in worker 2, 6 and 7 given my observation.


On 4 Nov 2017 5:15 pm, "Sean Owen" <[hidden email]> wrote:
Agree, seeing this somewhat regularly on the pull request builder. Do some machines inadvertently have Python 2.6? some builds succeed, so may just be one or a few. CC Shane.


On Thu, Nov 2, 2017 at 5:39 PM Pralabh Kumar <[hidden email]> wrote:
Hi Dev

Spark build is failing in Jenkins



Python versions prior to 2.7 are not supported.
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error?

Please help


Regards
Pralabh Kumar





Reply | Threaded
Open this post in threaded view
|

Re: Spark build is failing in amplab Jenkins

Frank Austin Nothaft
Hi Xin!

Alyssa and I chatted just now and both reviewed the mango build scripts. We don’t see anything in the mango build scripts that looks concerning. To give a bit more context, Mango is a Spark-based application for visualizing genomics data that is built in Scala, but which has python language bindings and a node.js frontend. During CI, the mango build runs the following steps:

• Creates a temp directory
• Runs maven to build the Java artifacts
• Copies the built artifacts into the temp directory, and cd’s into the temp directory. Inside the temp directory, we:
• Create a temporary conda environment and install node.js into the conda environment
• Pull down a pre-built distribution of Spark
• We then run our python build, from inside the temp directory
• Once this is done, we:
• Deactivate and remove the conda environment
• Delete the temp directory

This is very similar to the ADAM build, which has been running Python builds since mid-summer. We don’t manipulate any python dependencies outside of the conda environment, which we delete at the end of the build, so we are pretty confident that we’re not doing anything that should be breaking the PySpark builds.

To help debug, it would help if you could provide the path to the Python executables that get run during both a good and a bad build, as well as the Python versions. From our side (mango/ADAM), we’ve seen some oddness over the last few months with the environment on some of the Jenkins executors (things like the JAVA_HOME getting changed), but we haven’t been able to root cause those issues.

Regards,

Frank Austin Nothaft
202-340-0466

On Nov 4, 2017, at 10:50 PM, Frank A Nothaft <[hidden email]> wrote:

Hi Xin!

Mango does install python dependencies, but they should all be inside of a conda environment. My guess is that we've got somewhere in the mango Jenkins build where something is getting installed outside of the conda environment. I'll be looking into this shortly.

Regards,

Frank Austin Nothaft

On Nov 4, 2017, at 9:25 PM, Xin Lu <[hidden email]> wrote:

I'm not entirely sure if it's the cause because I can't see the build configurations, but just looking at the build logs it looks like they share a pool and those mango builds run some setup with python.  

On Sat, Nov 4, 2017 at 9:19 PM, Frank Austin Nothaft <[hidden email]> wrote:
Hi folks,

Alyssa (cc’ed) and I manage the mango build on the AMPLab Jenkins. I will start to look into this to see what the connection between the mango builds and the failing Spark builds are.

Regards,

Frank Austin Nothaft
<a href="tel:(202)%20340-0466" value="+12023400466" target="_blank" class="">202-340-0466

On Nov 4, 2017, at 9:15 PM, Xin Lu <[hidden email]> wrote:

Sorry, mango wasn't added recently, but it looks like after successful builds of this specific configuration the workers break:


And then after another configuration runs it recovers.

Xin

On Sat, Nov 4, 2017 at 9:09 PM, Xin Lu <[hidden email]> wrote:
It has happened with other workers as well, namely 3 and 4 and then recovered. Looking at the build history it looks like a project called mango has been added to this pool of machines recently:


It looks like the slaves start to fail spark pull request builds after some runs of mango.  

Xin

On Sat, Nov 4, 2017 at 1:23 AM, Hyukjin Kwon <[hidden email]> wrote:
I assume it is as it says:

Python versions prior to 2.7 are not supported.

Looks this happens in worker 2, 6 and 7 given my observation.


On 4 Nov 2017 5:15 pm, "Sean Owen" <[hidden email]> wrote:
Agree, seeing this somewhat regularly on the pull request builder. Do some machines inadvertently have Python 2.6? some builds succeed, so may just be one or a few. CC Shane.


On Thu, Nov 2, 2017 at 5:39 PM Pralabh Kumar <[hidden email]> wrote:
Hi Dev

Spark build is failing in Jenkins



Python versions prior to 2.7 are not supported.
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error?

Please help


Regards
Pralabh Kumar






Reply | Threaded
Open this post in threaded view
|

Re: Spark build is failing in amplab Jenkins

xin lu
Thanks, I actually don't have access to the machines or build configs to do proper debugging on this.  It looks like these  workers are shared with other build configurations  like avocado and cannoli as well and really any of the shared configs could be changing your JAVA_HOME and python environments.   It is fairly easy to debug if you can just change the spark build to run "which python"  and run it on one of the currently broken machines.  

On Sat, Nov 4, 2017 at 11:50 PM, Frank Austin Nothaft <[hidden email]> wrote:
Hi Xin!

Alyssa and I chatted just now and both reviewed the mango build scripts. We don’t see anything in the mango build scripts that looks concerning. To give a bit more context, Mango is a Spark-based application for visualizing genomics data that is built in Scala, but which has python language bindings and a node.js frontend. During CI, the mango build runs the following steps:

• Creates a temp directory
• Runs maven to build the Java artifacts
• Copies the built artifacts into the temp directory, and cd’s into the temp directory. Inside the temp directory, we:
• Create a temporary conda environment and install node.js into the conda environment
• Pull down a pre-built distribution of Spark
• We then run our python build, from inside the temp directory
• Once this is done, we:
• Deactivate and remove the conda environment
• Delete the temp directory

This is very similar to the ADAM build, which has been running Python builds since mid-summer. We don’t manipulate any python dependencies outside of the conda environment, which we delete at the end of the build, so we are pretty confident that we’re not doing anything that should be breaking the PySpark builds.

To help debug, it would help if you could provide the path to the Python executables that get run during both a good and a bad build, as well as the Python versions. From our side (mango/ADAM), we’ve seen some oddness over the last few months with the environment on some of the Jenkins executors (things like the JAVA_HOME getting changed), but we haven’t been able to root cause those issues.

Regards,

Frank Austin Nothaft
<a href="tel:(202)%20340-0466" value="+12023400466" target="_blank">202-340-0466

On Nov 4, 2017, at 10:50 PM, Frank A Nothaft <[hidden email]> wrote:

Hi Xin!

Mango does install python dependencies, but they should all be inside of a conda environment. My guess is that we've got somewhere in the mango Jenkins build where something is getting installed outside of the conda environment. I'll be looking into this shortly.

Regards,

Frank Austin Nothaft

On Nov 4, 2017, at 9:25 PM, Xin Lu <[hidden email]> wrote:

I'm not entirely sure if it's the cause because I can't see the build configurations, but just looking at the build logs it looks like they share a pool and those mango builds run some setup with python.  

On Sat, Nov 4, 2017 at 9:19 PM, Frank Austin Nothaft <[hidden email]> wrote:
Hi folks,

Alyssa (cc’ed) and I manage the mango build on the AMPLab Jenkins. I will start to look into this to see what the connection between the mango builds and the failing Spark builds are.

Regards,

Frank Austin Nothaft
<a href="tel:(202)%20340-0466" value="+12023400466" target="_blank">202-340-0466

On Nov 4, 2017, at 9:15 PM, Xin Lu <[hidden email]> wrote:

Sorry, mango wasn't added recently, but it looks like after successful builds of this specific configuration the workers break:


And then after another configuration runs it recovers.

Xin

On Sat, Nov 4, 2017 at 9:09 PM, Xin Lu <[hidden email]> wrote:
It has happened with other workers as well, namely 3 and 4 and then recovered. Looking at the build history it looks like a project called mango has been added to this pool of machines recently:


It looks like the slaves start to fail spark pull request builds after some runs of mango.  

Xin

On Sat, Nov 4, 2017 at 1:23 AM, Hyukjin Kwon <[hidden email]> wrote:
I assume it is as it says:

Python versions prior to 2.7 are not supported.

Looks this happens in worker 2, 6 and 7 given my observation.


On 4 Nov 2017 5:15 pm, "Sean Owen" <[hidden email]> wrote:
Agree, seeing this somewhat regularly on the pull request builder. Do some machines inadvertently have Python 2.6? some builds succeed, so may just be one or a few. CC Shane.


On Thu, Nov 2, 2017 at 5:39 PM Pralabh Kumar <[hidden email]> wrote:
Hi Dev

Spark build is failing in Jenkins



Python versions prior to 2.7 are not supported.
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error?

Please help


Regards
Pralabh Kumar







Reply | Threaded
Open this post in threaded view
|

Re: Spark build is failing in amplab Jenkins

xin lu
So, right now it looks like 2 and 6 are still broken, but 7 has recovered:


What I am suggesting is to just perhaps modify the SparkPullRequestBuilder configuration and run "which python" and then "python -V" to see what the pull request builder is seeing before it exits. Perhaps the sparkpullrequest builders are erroneously targeting a different conda environment because you have multiple nodes on each worker.   It looks like there is some build that's changing the environment and that's causing the workers to break and recover somewhat randomly.  

Xin

On Sun, Nov 5, 2017 at 8:29 AM, Alyssa Morrow <[hidden email]> wrote:
Hi Xin,

The extent of which our projects set exports are:

export JAVA_HOME=/usr/java/jdk1.8.0_60
export CONDA_BIN=/home/anaconda/bin/
export MVN_BIN=/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.1.1/bin/
export PATH=${JAVA_HOME}/bin/:${MVN_BIN}:${CONDA_BIN}:${PATH}

As for python, which python gives us python installed in the conda virtual environment:
~/.conda/envs/buildxxxx/bin/python

These steps look similar to how spark sets up its build. Not sure if this helps. Let me know if any other information would be helpful.

Best,

Alyssa Morrow
<a href="tel:(414)%20254-6645" value="+14142546645" target="_blank">414-254-6645






On Nov 5, 2017, at 8:15 AM, Xin Lu <[hidden email]> wrote:

Thanks, I actually don't have access to the machines or build configs to do proper debugging on this.  It looks like these  workers are shared with other build configurations  like avocado and cannoli as well and really any of the shared configs could be changing your JAVA_HOME and python environments.   It is fairly easy to debug if you can just change the spark build to run "which python"  and run it on one of the currently broken machines.  

On Sat, Nov 4, 2017 at 11:50 PM, Frank Austin Nothaft <[hidden email]> wrote:
Hi Xin!

Alyssa and I chatted just now and both reviewed the mango build scripts. We don’t see anything in the mango build scripts that looks concerning. To give a bit more context, Mango is a Spark-based application for visualizing genomics data that is built in Scala, but which has python language bindings and a node.js frontend. During CI, the mango build runs the following steps:

• Creates a temp directory
• Runs maven to build the Java artifacts
• Copies the built artifacts into the temp directory, and cd’s into the temp directory. Inside the temp directory, we:
• Create a temporary conda environment and install node.js into the conda environment
• Pull down a pre-built distribution of Spark
• We then run our python build, from inside the temp directory
• Once this is done, we:
• Deactivate and remove the conda environment
• Delete the temp directory

This is very similar to the ADAM build, which has been running Python builds since mid-summer. We don’t manipulate any python dependencies outside of the conda environment, which we delete at the end of the build, so we are pretty confident that we’re not doing anything that should be breaking the PySpark builds.

To help debug, it would help if you could provide the path to the Python executables that get run during both a good and a bad build, as well as the Python versions. From our side (mango/ADAM), we’ve seen some oddness over the last few months with the environment on some of the Jenkins executors (things like the JAVA_HOME getting changed), but we haven’t been able to root cause those issues.

Regards,

Frank Austin Nothaft
<a href="tel:(202)%20340-0466" value="+12023400466" target="_blank">202-340-0466

On Nov 4, 2017, at 10:50 PM, Frank A Nothaft <[hidden email]> wrote:

Hi Xin!

Mango does install python dependencies, but they should all be inside of a conda environment. My guess is that we've got somewhere in the mango Jenkins build where something is getting installed outside of the conda environment. I'll be looking into this shortly.

Regards,

Frank Austin Nothaft

On Nov 4, 2017, at 9:25 PM, Xin Lu <[hidden email]> wrote:

I'm not entirely sure if it's the cause because I can't see the build configurations, but just looking at the build logs it looks like they share a pool and those mango builds run some setup with python.  

On Sat, Nov 4, 2017 at 9:19 PM, Frank Austin Nothaft <[hidden email]> wrote:
Hi folks,

Alyssa (cc’ed) and I manage the mango build on the AMPLab Jenkins. I will start to look into this to see what the connection between the mango builds and the failing Spark builds are.

Regards,

Frank Austin Nothaft
<a href="tel:(202)%20340-0466" value="+12023400466" target="_blank">202-340-0466

On Nov 4, 2017, at 9:15 PM, Xin Lu <[hidden email]> wrote:

Sorry, mango wasn't added recently, but it looks like after successful builds of this specific configuration the workers break:


And then after another configuration runs it recovers.

Xin

On Sat, Nov 4, 2017 at 9:09 PM, Xin Lu <[hidden email]> wrote:
It has happened with other workers as well, namely 3 and 4 and then recovered. Looking at the build history it looks like a project called mango has been added to this pool of machines recently:


It looks like the slaves start to fail spark pull request builds after some runs of mango.  

Xin

On Sat, Nov 4, 2017 at 1:23 AM, Hyukjin Kwon <[hidden email]> wrote:
I assume it is as it says:

Python versions prior to 2.7 are not supported.

Looks this happens in worker 2, 6 and 7 given my observation.


On 4 Nov 2017 5:15 pm, "Sean Owen" <[hidden email]> wrote:
Agree, seeing this somewhat regularly on the pull request builder. Do some machines inadvertently have Python 2.6? some builds succeed, so may just be one or a few. CC Shane.


On Thu, Nov 2, 2017 at 5:39 PM Pralabh Kumar <[hidden email]> wrote:
Hi Dev

Spark build is failing in Jenkins



Python versions prior to 2.7 are not supported.
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error?

Please help


Regards
Pralabh Kumar









Reply | Threaded
Open this post in threaded view
|

Re: Spark build is failing in amplab Jenkins

xin lu
Also another thing to look at is if you guys have any kinda of nightly cleanup scripts for these workers that completely nuke the conda environments.  If there is one maybe that's why some of them recover after a while.  I don't know enough about your infra right now to understand all the things that could cause the current unstable behavior so these are just some guesses.  Anyway, I sent a previous email about running spark tests in docker and noone responded.  At Databricks the whole build infra to run spark tests was very different.  Spark tests were run in docker and had a jenkins that was dedicated to it.  Perhaps that's something that can be replicated for OSS. 

On Sun, Nov 5, 2017 at 8:45 AM, Xin Lu <[hidden email]> wrote:
So, right now it looks like 2 and 6 are still broken, but 7 has recovered:


What I am suggesting is to just perhaps modify the SparkPullRequestBuilder configuration and run "which python" and then "python -V" to see what the pull request builder is seeing before it exits. Perhaps the sparkpullrequest builders are erroneously targeting a different conda environment because you have multiple nodes on each worker.   It looks like there is some build that's changing the environment and that's causing the workers to break and recover somewhat randomly.  

Xin

On Sun, Nov 5, 2017 at 8:29 AM, Alyssa Morrow <[hidden email]> wrote:
Hi Xin,

The extent of which our projects set exports are:

export JAVA_HOME=/usr/java/jdk1.8.0_60
export CONDA_BIN=/home/anaconda/bin/
export MVN_BIN=/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.1.1/bin/
export PATH=${JAVA_HOME}/bin/:${MVN_BIN}:${CONDA_BIN}:${PATH}

As for python, which python gives us python installed in the conda virtual environment:
~/.conda/envs/buildxxxx/bin/python

These steps look similar to how spark sets up its build. Not sure if this helps. Let me know if any other information would be helpful.

Best,

Alyssa Morrow
<a href="tel:(414)%20254-6645" value="+14142546645" target="_blank">414-254-6645






On Nov 5, 2017, at 8:15 AM, Xin Lu <[hidden email]> wrote:

Thanks, I actually don't have access to the machines or build configs to do proper debugging on this.  It looks like these  workers are shared with other build configurations  like avocado and cannoli as well and really any of the shared configs could be changing your JAVA_HOME and python environments.   It is fairly easy to debug if you can just change the spark build to run "which python"  and run it on one of the currently broken machines.  

On Sat, Nov 4, 2017 at 11:50 PM, Frank Austin Nothaft <[hidden email]> wrote:
Hi Xin!

Alyssa and I chatted just now and both reviewed the mango build scripts. We don’t see anything in the mango build scripts that looks concerning. To give a bit more context, Mango is a Spark-based application for visualizing genomics data that is built in Scala, but which has python language bindings and a node.js frontend. During CI, the mango build runs the following steps:

• Creates a temp directory
• Runs maven to build the Java artifacts
• Copies the built artifacts into the temp directory, and cd’s into the temp directory. Inside the temp directory, we:
• Create a temporary conda environment and install node.js into the conda environment
• Pull down a pre-built distribution of Spark
• We then run our python build, from inside the temp directory
• Once this is done, we:
• Deactivate and remove the conda environment
• Delete the temp directory

This is very similar to the ADAM build, which has been running Python builds since mid-summer. We don’t manipulate any python dependencies outside of the conda environment, which we delete at the end of the build, so we are pretty confident that we’re not doing anything that should be breaking the PySpark builds.

To help debug, it would help if you could provide the path to the Python executables that get run during both a good and a bad build, as well as the Python versions. From our side (mango/ADAM), we’ve seen some oddness over the last few months with the environment on some of the Jenkins executors (things like the JAVA_HOME getting changed), but we haven’t been able to root cause those issues.

Regards,

Frank Austin Nothaft
<a href="tel:(202)%20340-0466" value="+12023400466" target="_blank">202-340-0466

On Nov 4, 2017, at 10:50 PM, Frank A Nothaft <[hidden email]> wrote:

Hi Xin!

Mango does install python dependencies, but they should all be inside of a conda environment. My guess is that we've got somewhere in the mango Jenkins build where something is getting installed outside of the conda environment. I'll be looking into this shortly.

Regards,

Frank Austin Nothaft

On Nov 4, 2017, at 9:25 PM, Xin Lu <[hidden email]> wrote:

I'm not entirely sure if it's the cause because I can't see the build configurations, but just looking at the build logs it looks like they share a pool and those mango builds run some setup with python.  

On Sat, Nov 4, 2017 at 9:19 PM, Frank Austin Nothaft <[hidden email]> wrote:
Hi folks,

Alyssa (cc’ed) and I manage the mango build on the AMPLab Jenkins. I will start to look into this to see what the connection between the mango builds and the failing Spark builds are.

Regards,

Frank Austin Nothaft
<a href="tel:(202)%20340-0466" value="+12023400466" target="_blank">202-340-0466

On Nov 4, 2017, at 9:15 PM, Xin Lu <[hidden email]> wrote:

Sorry, mango wasn't added recently, but it looks like after successful builds of this specific configuration the workers break:


And then after another configuration runs it recovers.

Xin

On Sat, Nov 4, 2017 at 9:09 PM, Xin Lu <[hidden email]> wrote:
It has happened with other workers as well, namely 3 and 4 and then recovered. Looking at the build history it looks like a project called mango has been added to this pool of machines recently:


It looks like the slaves start to fail spark pull request builds after some runs of mango.  

Xin

On Sat, Nov 4, 2017 at 1:23 AM, Hyukjin Kwon <[hidden email]> wrote:
I assume it is as it says:

Python versions prior to 2.7 are not supported.

Looks this happens in worker 2, 6 and 7 given my observation.


On 4 Nov 2017 5:15 pm, "Sean Owen" <[hidden email]> wrote:
Agree, seeing this somewhat regularly on the pull request builder. Do some machines inadvertently have Python 2.6? some builds succeed, so may just be one or a few. CC Shane.


On Thu, Nov 2, 2017 at 5:39 PM Pralabh Kumar <[hidden email]> wrote:
Hi Dev

Spark build is failing in Jenkins



Python versions prior to 2.7 are not supported.
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error?

Please help


Regards
Pralabh Kumar










Reply | Threaded
Open this post in threaded view
|

Re: Spark build is failing in amplab Jenkins

shane knapp
hello from the canary islands!  ;)

i just saw this thread, and another one about a quick power loss at the colo where our machines are hosted.  the master is on UPS but the workers aren't...  and when they come back, the PATH variable specified in the workers' configs get dropped and we see behavior like this.

josh rosen (whom i am talking with over chat) will be restarting the ssh/worker processes on all of the worker nodes immediately.  this will fix the problem.

now, back to my holiday!  :)

On Sun, Nov 5, 2017 at 5:01 PM, Xin Lu <[hidden email]> wrote:
Also another thing to look at is if you guys have any kinda of nightly cleanup scripts for these workers that completely nuke the conda environments.  If there is one maybe that's why some of them recover after a while.  I don't know enough about your infra right now to understand all the things that could cause the current unstable behavior so these are just some guesses.  Anyway, I sent a previous email about running spark tests in docker and noone responded.  At Databricks the whole build infra to run spark tests was very different.  Spark tests were run in docker and had a jenkins that was dedicated to it.  Perhaps that's something that can be replicated for OSS. 

On Sun, Nov 5, 2017 at 8:45 AM, Xin Lu <[hidden email]> wrote:
So, right now it looks like 2 and 6 are still broken, but 7 has recovered:


What I am suggesting is to just perhaps modify the SparkPullRequestBuilder configuration and run "which python" and then "python -V" to see what the pull request builder is seeing before it exits. Perhaps the sparkpullrequest builders are erroneously targeting a different conda environment because you have multiple nodes on each worker.   It looks like there is some build that's changing the environment and that's causing the workers to break and recover somewhat randomly.  

Xin

On Sun, Nov 5, 2017 at 8:29 AM, Alyssa Morrow <[hidden email]> wrote:
Hi Xin,

The extent of which our projects set exports are:

export JAVA_HOME=/usr/java/jdk1.8.0_60
export CONDA_BIN=/home/anaconda/bin/
export MVN_BIN=/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.1.1/bin/
export PATH=${JAVA_HOME}/bin/:${MVN_BIN}:${CONDA_BIN}:${PATH}

As for python, which python gives us python installed in the conda virtual environment:
~/.conda/envs/buildxxxx/bin/python

These steps look similar to how spark sets up its build. Not sure if this helps. Let me know if any other information would be helpful.

Best,

Alyssa Morrow
<a href="tel:(414)%20254-6645" value="+14142546645" target="_blank">414-254-6645






On Nov 5, 2017, at 8:15 AM, Xin Lu <[hidden email]> wrote:

Thanks, I actually don't have access to the machines or build configs to do proper debugging on this.  It looks like these  workers are shared with other build configurations  like avocado and cannoli as well and really any of the shared configs could be changing your JAVA_HOME and python environments.   It is fairly easy to debug if you can just change the spark build to run "which python"  and run it on one of the currently broken machines.  

On Sat, Nov 4, 2017 at 11:50 PM, Frank Austin Nothaft <[hidden email]> wrote:
Hi Xin!

Alyssa and I chatted just now and both reviewed the mango build scripts. We don’t see anything in the mango build scripts that looks concerning. To give a bit more context, Mango is a Spark-based application for visualizing genomics data that is built in Scala, but which has python language bindings and a node.js frontend. During CI, the mango build runs the following steps:

• Creates a temp directory
• Runs maven to build the Java artifacts
• Copies the built artifacts into the temp directory, and cd’s into the temp directory. Inside the temp directory, we:
• Create a temporary conda environment and install node.js into the conda environment
• Pull down a pre-built distribution of Spark
• We then run our python build, from inside the temp directory
• Once this is done, we:
• Deactivate and remove the conda environment
• Delete the temp directory

This is very similar to the ADAM build, which has been running Python builds since mid-summer. We don’t manipulate any python dependencies outside of the conda environment, which we delete at the end of the build, so we are pretty confident that we’re not doing anything that should be breaking the PySpark builds.

To help debug, it would help if you could provide the path to the Python executables that get run during both a good and a bad build, as well as the Python versions. From our side (mango/ADAM), we’ve seen some oddness over the last few months with the environment on some of the Jenkins executors (things like the JAVA_HOME getting changed), but we haven’t been able to root cause those issues.

Regards,

Frank Austin Nothaft
<a href="tel:(202)%20340-0466" value="+12023400466" target="_blank">202-340-0466

On Nov 4, 2017, at 10:50 PM, Frank A Nothaft <[hidden email]> wrote:

Hi Xin!

Mango does install python dependencies, but they should all be inside of a conda environment. My guess is that we've got somewhere in the mango Jenkins build where something is getting installed outside of the conda environment. I'll be looking into this shortly.

Regards,

Frank Austin Nothaft

On Nov 4, 2017, at 9:25 PM, Xin Lu <[hidden email]> wrote:

I'm not entirely sure if it's the cause because I can't see the build configurations, but just looking at the build logs it looks like they share a pool and those mango builds run some setup with python.  

On Sat, Nov 4, 2017 at 9:19 PM, Frank Austin Nothaft <[hidden email]> wrote:
Hi folks,

Alyssa (cc’ed) and I manage the mango build on the AMPLab Jenkins. I will start to look into this to see what the connection between the mango builds and the failing Spark builds are.

Regards,

Frank Austin Nothaft
<a href="tel:(202)%20340-0466" value="+12023400466" target="_blank">202-340-0466

On Nov 4, 2017, at 9:15 PM, Xin Lu <[hidden email]> wrote:

Sorry, mango wasn't added recently, but it looks like after successful builds of this specific configuration the workers break:


And then after another configuration runs it recovers.

Xin

On Sat, Nov 4, 2017 at 9:09 PM, Xin Lu <[hidden email]> wrote:
It has happened with other workers as well, namely 3 and 4 and then recovered. Looking at the build history it looks like a project called mango has been added to this pool of machines recently:


It looks like the slaves start to fail spark pull request builds after some runs of mango.  

Xin

On Sat, Nov 4, 2017 at 1:23 AM, Hyukjin Kwon <[hidden email]> wrote:
I assume it is as it says:

Python versions prior to 2.7 are not supported.

Looks this happens in worker 2, 6 and 7 given my observation.


On 4 Nov 2017 5:15 pm, "Sean Owen" <[hidden email]> wrote:
Agree, seeing this somewhat regularly on the pull request builder. Do some machines inadvertently have Python 2.6? some builds succeed, so may just be one or a few. CC Shane.


On Thu, Nov 2, 2017 at 5:39 PM Pralabh Kumar <[hidden email]> wrote:
Hi Dev

Spark build is failing in Jenkins



Python versions prior to 2.7 are not supported.
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error?

Please help


Regards
Pralabh Kumar











Reply | Threaded
Open this post in threaded view
|

Re: Spark build is failing in amplab Jenkins

Josh Rosen-2
Disconnecting and reconnecting each Jenkins worker appears to have resolved the PATH issue: in the System Info page for each worker, I now see a PATH which includes Anaconda.

To restart the worker processes, I only needed to hit the "Disconnect" button in the Jenkins master UI for each worker, wait a few seconds, then hit the "Relaunch slave agent" button. It's fortunate that this could be done entirely from the Jenkins UI without having to actually SSH into the individual worker machines.

It looks like all workers should now be in a good state. If you see any new failures due to PATH issues, though, then please ping this thread.

On Sun, Nov 5, 2017 at 9:21 AM shane knapp <[hidden email]> wrote:
hello from the canary islands!  ;)

i just saw this thread, and another one about a quick power loss at the colo where our machines are hosted.  the master is on UPS but the workers aren't...  and when they come back, the PATH variable specified in the workers' configs get dropped and we see behavior like this.

josh rosen (whom i am talking with over chat) will be restarting the ssh/worker processes on all of the worker nodes immediately.  this will fix the problem.

now, back to my holiday!  :)

On Sun, Nov 5, 2017 at 5:01 PM, Xin Lu <[hidden email]> wrote:
Also another thing to look at is if you guys have any kinda of nightly cleanup scripts for these workers that completely nuke the conda environments.  If there is one maybe that's why some of them recover after a while.  I don't know enough about your infra right now to understand all the things that could cause the current unstable behavior so these are just some guesses.  Anyway, I sent a previous email about running spark tests in docker and noone responded.  At Databricks the whole build infra to run spark tests was very different.  Spark tests were run in docker and had a jenkins that was dedicated to it.  Perhaps that's something that can be replicated for OSS. 

On Sun, Nov 5, 2017 at 8:45 AM, Xin Lu <[hidden email]> wrote:
So, right now it looks like 2 and 6 are still broken, but 7 has recovered:


What I am suggesting is to just perhaps modify the SparkPullRequestBuilder configuration and run "which python" and then "python -V" to see what the pull request builder is seeing before it exits. Perhaps the sparkpullrequest builders are erroneously targeting a different conda environment because you have multiple nodes on each worker.   It looks like there is some build that's changing the environment and that's causing the workers to break and recover somewhat randomly.  

Xin

On Sun, Nov 5, 2017 at 8:29 AM, Alyssa Morrow <[hidden email]> wrote:
Hi Xin,

The extent of which our projects set exports are:

export JAVA_HOME=/usr/java/jdk1.8.0_60
export CONDA_BIN=/home/anaconda/bin/
export MVN_BIN=/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.1.1/bin/
export PATH=${JAVA_HOME}/bin/:${MVN_BIN}:${CONDA_BIN}:${PATH}

As for python, which python gives us python installed in the conda virtual environment:
~/.conda/envs/buildxxxx/bin/python

These steps look similar to how spark sets up its build. Not sure if this helps. Let me know if any other information would be helpful.

Best,

Alyssa Morrow
<a href="tel:(414)%20254-6645" value="+14142546645" target="_blank">414-254-6645






On Nov 5, 2017, at 8:15 AM, Xin Lu <[hidden email]> wrote:

Thanks, I actually don't have access to the machines or build configs to do proper debugging on this.  It looks like these  workers are shared with other build configurations  like avocado and cannoli as well and really any of the shared configs could be changing your JAVA_HOME and python environments.   It is fairly easy to debug if you can just change the spark build to run "which python"  and run it on one of the currently broken machines.  

On Sat, Nov 4, 2017 at 11:50 PM, Frank Austin Nothaft <[hidden email]> wrote:
Hi Xin!

Alyssa and I chatted just now and both reviewed the mango build scripts. We don’t see anything in the mango build scripts that looks concerning. To give a bit more context, Mango is a Spark-based application for visualizing genomics data that is built in Scala, but which has python language bindings and a node.js frontend. During CI, the mango build runs the following steps:

• Creates a temp directory
• Runs maven to build the Java artifacts
• Copies the built artifacts into the temp directory, and cd’s into the temp directory. Inside the temp directory, we:
• Create a temporary conda environment and install node.js into the conda environment
• Pull down a pre-built distribution of Spark
• We then run our python build, from inside the temp directory
• Once this is done, we:
• Deactivate and remove the conda environment
• Delete the temp directory

This is very similar to the ADAM build, which has been running Python builds since mid-summer. We don’t manipulate any python dependencies outside of the conda environment, which we delete at the end of the build, so we are pretty confident that we’re not doing anything that should be breaking the PySpark builds.

To help debug, it would help if you could provide the path to the Python executables that get run during both a good and a bad build, as well as the Python versions. From our side (mango/ADAM), we’ve seen some oddness over the last few months with the environment on some of the Jenkins executors (things like the JAVA_HOME getting changed), but we haven’t been able to root cause those issues.

Regards,

Frank Austin Nothaft
<a href="tel:(202)%20340-0466" value="+12023400466" target="_blank">202-340-0466

On Nov 4, 2017, at 10:50 PM, Frank A Nothaft <[hidden email]> wrote:

Hi Xin!

Mango does install python dependencies, but they should all be inside of a conda environment. My guess is that we've got somewhere in the mango Jenkins build where something is getting installed outside of the conda environment. I'll be looking into this shortly.

Regards,

Frank Austin Nothaft

On Nov 4, 2017, at 9:25 PM, Xin Lu <[hidden email]> wrote:

I'm not entirely sure if it's the cause because I can't see the build configurations, but just looking at the build logs it looks like they share a pool and those mango builds run some setup with python.  

On Sat, Nov 4, 2017 at 9:19 PM, Frank Austin Nothaft <[hidden email]> wrote:
Hi folks,

Alyssa (cc’ed) and I manage the mango build on the AMPLab Jenkins. I will start to look into this to see what the connection between the mango builds and the failing Spark builds are.

Regards,

Frank Austin Nothaft
<a href="tel:(202)%20340-0466" value="+12023400466" target="_blank">202-340-0466

On Nov 4, 2017, at 9:15 PM, Xin Lu <[hidden email]> wrote:

Sorry, mango wasn't added recently, but it looks like after successful builds of this specific configuration the workers break:


And then after another configuration runs it recovers.

Xin

On Sat, Nov 4, 2017 at 9:09 PM, Xin Lu <[hidden email]> wrote:
It has happened with other workers as well, namely 3 and 4 and then recovered. Looking at the build history it looks like a project called mango has been added to this pool of machines recently:


It looks like the slaves start to fail spark pull request builds after some runs of mango.  

Xin

On Sat, Nov 4, 2017 at 1:23 AM, Hyukjin Kwon <[hidden email]> wrote:
I assume it is as it says:

Python versions prior to 2.7 are not supported.

Looks this happens in worker 2, 6 and 7 given my observation.


On 4 Nov 2017 5:15 pm, "Sean Owen" <[hidden email]> wrote:
Agree, seeing this somewhat regularly on the pull request builder. Do some machines inadvertently have Python 2.6? some builds succeed, so may just be one or a few. CC Shane.


On Thu, Nov 2, 2017 at 5:39 PM Pralabh Kumar <[hidden email]> wrote:
Hi Dev

Spark build is failing in Jenkins



Python versions prior to 2.7 are not supported.
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error?

Please help


Regards
Pralabh Kumar











Reply | Threaded
Open this post in threaded view
|

Re: Spark build is failing in amplab Jenkins

xin lu
To prevent this in the future you could set up something that checks if each worker has the path.  If a worker doesn't satisfy the criteria then just mark the worker offline or restart the process automatically.    It's doable with a maintenance script on the jenkins master.  When it fails like this it doesn't even post anything back to github so contributors have no idea what to do. 

On Sun, Nov 5, 2017 at 9:40 AM, Josh Rosen <[hidden email]> wrote:
Disconnecting and reconnecting each Jenkins worker appears to have resolved the PATH issue: in the System Info page for each worker, I now see a PATH which includes Anaconda.

To restart the worker processes, I only needed to hit the "Disconnect" button in the Jenkins master UI for each worker, wait a few seconds, then hit the "Relaunch slave agent" button. It's fortunate that this could be done entirely from the Jenkins UI without having to actually SSH into the individual worker machines.

It looks like all workers should now be in a good state. If you see any new failures due to PATH issues, though, then please ping this thread.

On Sun, Nov 5, 2017 at 9:21 AM shane knapp <[hidden email]> wrote:
hello from the canary islands!  ;)

i just saw this thread, and another one about a quick power loss at the colo where our machines are hosted.  the master is on UPS but the workers aren't...  and when they come back, the PATH variable specified in the workers' configs get dropped and we see behavior like this.

josh rosen (whom i am talking with over chat) will be restarting the ssh/worker processes on all of the worker nodes immediately.  this will fix the problem.

now, back to my holiday!  :)

On Sun, Nov 5, 2017 at 5:01 PM, Xin Lu <[hidden email]> wrote:
Also another thing to look at is if you guys have any kinda of nightly cleanup scripts for these workers that completely nuke the conda environments.  If there is one maybe that's why some of them recover after a while.  I don't know enough about your infra right now to understand all the things that could cause the current unstable behavior so these are just some guesses.  Anyway, I sent a previous email about running spark tests in docker and noone responded.  At Databricks the whole build infra to run spark tests was very different.  Spark tests were run in docker and had a jenkins that was dedicated to it.  Perhaps that's something that can be replicated for OSS. 

On Sun, Nov 5, 2017 at 8:45 AM, Xin Lu <[hidden email]> wrote:
So, right now it looks like 2 and 6 are still broken, but 7 has recovered:


What I am suggesting is to just perhaps modify the SparkPullRequestBuilder configuration and run "which python" and then "python -V" to see what the pull request builder is seeing before it exits. Perhaps the sparkpullrequest builders are erroneously targeting a different conda environment because you have multiple nodes on each worker.   It looks like there is some build that's changing the environment and that's causing the workers to break and recover somewhat randomly.  

Xin

On Sun, Nov 5, 2017 at 8:29 AM, Alyssa Morrow <[hidden email]> wrote:
Hi Xin,

The extent of which our projects set exports are:

export JAVA_HOME=/usr/java/jdk1.8.0_60
export CONDA_BIN=/home/anaconda/bin/
export MVN_BIN=/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.1.1/bin/
export PATH=${JAVA_HOME}/bin/:${MVN_BIN}:${CONDA_BIN}:${PATH}

As for python, which python gives us python installed in the conda virtual environment:
~/.conda/envs/buildxxxx/bin/python

These steps look similar to how spark sets up its build. Not sure if this helps. Let me know if any other information would be helpful.

Best,

Alyssa Morrow
<a href="tel:(414)%20254-6645" value="+14142546645" target="_blank">414-254-6645






On Nov 5, 2017, at 8:15 AM, Xin Lu <[hidden email]> wrote:

Thanks, I actually don't have access to the machines or build configs to do proper debugging on this.  It looks like these  workers are shared with other build configurations  like avocado and cannoli as well and really any of the shared configs could be changing your JAVA_HOME and python environments.   It is fairly easy to debug if you can just change the spark build to run "which python"  and run it on one of the currently broken machines.  

On Sat, Nov 4, 2017 at 11:50 PM, Frank Austin Nothaft <[hidden email]> wrote:
Hi Xin!

Alyssa and I chatted just now and both reviewed the mango build scripts. We don’t see anything in the mango build scripts that looks concerning. To give a bit more context, Mango is a Spark-based application for visualizing genomics data that is built in Scala, but which has python language bindings and a node.js frontend. During CI, the mango build runs the following steps:

• Creates a temp directory
• Runs maven to build the Java artifacts
• Copies the built artifacts into the temp directory, and cd’s into the temp directory. Inside the temp directory, we:
• Create a temporary conda environment and install node.js into the conda environment
• Pull down a pre-built distribution of Spark
• We then run our python build, from inside the temp directory
• Once this is done, we:
• Deactivate and remove the conda environment
• Delete the temp directory

This is very similar to the ADAM build, which has been running Python builds since mid-summer. We don’t manipulate any python dependencies outside of the conda environment, which we delete at the end of the build, so we are pretty confident that we’re not doing anything that should be breaking the PySpark builds.

To help debug, it would help if you could provide the path to the Python executables that get run during both a good and a bad build, as well as the Python versions. From our side (mango/ADAM), we’ve seen some oddness over the last few months with the environment on some of the Jenkins executors (things like the JAVA_HOME getting changed), but we haven’t been able to root cause those issues.

Regards,

Frank Austin Nothaft
<a href="tel:(202)%20340-0466" value="+12023400466" target="_blank">202-340-0466

On Nov 4, 2017, at 10:50 PM, Frank A Nothaft <[hidden email]> wrote:

Hi Xin!

Mango does install python dependencies, but they should all be inside of a conda environment. My guess is that we've got somewhere in the mango Jenkins build where something is getting installed outside of the conda environment. I'll be looking into this shortly.

Regards,

Frank Austin Nothaft

On Nov 4, 2017, at 9:25 PM, Xin Lu <[hidden email]> wrote:

I'm not entirely sure if it's the cause because I can't see the build configurations, but just looking at the build logs it looks like they share a pool and those mango builds run some setup with python.  

On Sat, Nov 4, 2017 at 9:19 PM, Frank Austin Nothaft <[hidden email]> wrote:
Hi folks,

Alyssa (cc’ed) and I manage the mango build on the AMPLab Jenkins. I will start to look into this to see what the connection between the mango builds and the failing Spark builds are.

Regards,

Frank Austin Nothaft
<a href="tel:(202)%20340-0466" value="+12023400466" target="_blank">202-340-0466

On Nov 4, 2017, at 9:15 PM, Xin Lu <[hidden email]> wrote:

Sorry, mango wasn't added recently, but it looks like after successful builds of this specific configuration the workers break:


And then after another configuration runs it recovers.

Xin

On Sat, Nov 4, 2017 at 9:09 PM, Xin Lu <[hidden email]> wrote:
It has happened with other workers as well, namely 3 and 4 and then recovered. Looking at the build history it looks like a project called mango has been added to this pool of machines recently:


It looks like the slaves start to fail spark pull request builds after some runs of mango.  

Xin

On Sat, Nov 4, 2017 at 1:23 AM, Hyukjin Kwon <[hidden email]> wrote:
I assume it is as it says:

Python versions prior to 2.7 are not supported.

Looks this happens in worker 2, 6 and 7 given my observation.


On 4 Nov 2017 5:15 pm, "Sean Owen" <[hidden email]> wrote:
Agree, seeing this somewhat regularly on the pull request builder. Do some machines inadvertently have Python 2.6? some builds succeed, so may just be one or a few. CC Shane.


On Thu, Nov 2, 2017 at 5:39 PM Pralabh Kumar <[hidden email]> wrote:
Hi Dev

Spark build is failing in Jenkins



Python versions prior to 2.7 are not supported.
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error?

Please help


Regards
Pralabh Kumar