Quantcast

[VOTE] Apache Spark 2.1.1 (RC3)

classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[VOTE] Apache Spark 2.1.1 (RC3)

Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.1.1
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.1.1-rc3 (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.1.1?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.1.2 or 2.2.0.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.0.

What happened to RC1?

There were issues with the release packaging and as a result was skipped.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [VOTE] Apache Spark 2.1.1 (RC3)

Nicholas Chammas

I had trouble starting up a shell with the AWS package loaded (specifically, org.apache.hadoop:hadoop-aws:2.7.3):


                [NOT FOUND  ] com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)

        ==== local-m2-cache: tried

          file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar

                [NOT FOUND  ] org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)

        ==== local-m2-cache: tried

          file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar

                [NOT FOUND  ] com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)

        ==== local-m2-cache: tried

          file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar

                ::::::::::::::::::::::::::::::::::::::::::::::

                ::              FAILED DOWNLOADS            ::

                :: ^ see resolution messages for details  ^ ::

                ::::::::::::::::::::::::::::::::::::::::::::::

                :: com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle)

                :: org.codehaus.jettison#jettison;1.1!jettison.jar(bundle)

                :: com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar

                :: com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle)

                ::::::::::::::::::::::::::::::::::::::::::::::

Anyone know anything about this? I made sure to build Spark against the appropriate version of Hadoop.

Nick

On Tue, Apr 18, 2017 at 2:59 PM Michael Armbrust <[hidden email]> wrote:

Please vote on releasing the following candidate as Apache Spark version 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.1.1
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.1.1-rc3 (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.1.1?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.1.2 or 2.2.0.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.0.

What happened to RC1?

There were issues with the release packaging and as a result was skipped.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [VOTE] Apache Spark 2.1.1 (RC3)

Sean Owen
In reply to this post by Michael Armbrust
+1 from me -- this worked unusually smoothly on the first try.

Sigs and license and so forth look OK. Tests pass with Java 8, Ubuntu 17, -Phive -Phadoop-2.7 -Pyarn.

I had to run the build with -Xss2m to get this test to pass, but it might be somewhat specific to my env somehow:

- SPARK-16845: GeneratedClass$SpecificOrdering grows beyond 64 KB *** FAILED ***
  com.google.common.util.concurrent.ExecutionError: java.lang.StackOverflowError
  at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2261)


On Tue, Apr 18, 2017 at 7:59 PM Michael Armbrust <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.1.1
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.1.1-rc3 (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.1.1?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.1.2 or 2.2.0.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.0.

What happened to RC1?

There were issues with the release packaging and as a result was skipped.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [VOTE] Apache Spark 2.1.1 (RC3)

Kazuaki Ishizaki
In reply to this post by Michael Armbrust
+1 (non-binding)

I tested it on Ubuntu 16.04 and openjdk8 on ppc64le. All of the tests for core have passed..

$ java -version
openjdk version "1.8.0_111"
OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14)
OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
$ build/mvn -DskipTests -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7 package install
$ build/mvn -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7 test -pl core
...
Total number of tests run: 1788
Suites: completed 198, aborted 0
Tests: succeeded 1788, failed 0, canceled 4, ignored 8, pending 0
All tests passed.
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 16:38 min
[INFO] Finished at: 2017-04-19T18:17:43+09:00
[INFO] Final Memory: 56M/672M
[INFO] ------------------------------------------------------------------------

Regards,
Kazuaki Ishizaki,



From:        Michael Armbrust <[hidden email]>
To:        "[hidden email]" <[hidden email]>
Date:        2017/04/19 04:00
Subject:        [VOTE] Apache Spark 2.1.1 (RC3)




Please vote on releasing the following candidate as Apache Spark version 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.1.1
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.1.1-rc3 (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:
http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1230/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.1.1?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.1.2 or 2.2.0.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.0.

What happened to RC1?

There were issues with the release packaging and as a result was skipped.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [VOTE] Apache Spark 2.1.1 (RC3)

Marcelo Vanzin
In reply to this post by Michael Armbrust
+1 (non-binding).

Ran the hadoop-2.6 binary against our internal tests and things look good.

On Tue, Apr 18, 2017 at 11:59 AM, Michael Armbrust
<[hidden email]> wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.1
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.1-rc3
> (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)
>
> List of JIRA tickets resolved can be found with this filter.
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1230/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/
>
>
> FAQ
>
> How can I help test this release?
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> What should happen to JIRA tickets still targeting 2.1.1?
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>
> But my bug isn't fixed!??!
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.1.0.
>
> What happened to RC1?
>
> There were issues with the release packaging and as a result was skipped.



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [VOTE] Apache Spark 2.1.1 (RC3)

rxin
+1

On Wed, Apr 19, 2017 at 3:31 PM, Marcelo Vanzin <[hidden email]> wrote:
+1 (non-binding).

Ran the hadoop-2.6 binary against our internal tests and things look good.

On Tue, Apr 18, 2017 at 11:59 AM, Michael Armbrust
<[hidden email]> wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.1
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.1-rc3
> (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)
>
> List of JIRA tickets resolved can be found with this filter.
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1230/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/
>
>
> FAQ
>
> How can I help test this release?
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> What should happen to JIRA tickets still targeting 2.1.1?
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>
> But my bug isn't fixed!??!
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.1.0.
>
> What happened to RC1?
>
> There were issues with the release packaging and as a result was skipped.



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [VOTE] Apache Spark 2.1.1 (RC3)

Dong Joon Hyun
+1

I tested RC3 on CentOS 7.3.1611/OpenJDK 1.8.0_121/R 3.3.3
with `-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver –Psparkr`

At the end of R test, I saw `Had CRAN check errors; see logs.`,
but tests passed and log file looks good.

Bests,
Dongjoon.

From: Reynold Xin <[hidden email]>
Date: Wednesday, April 19, 2017 at 3:41 PM
To: Marcelo Vanzin <[hidden email]>
Cc: Michael Armbrust <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [VOTE] Apache Spark 2.1.1 (RC3)

+1

On Wed, Apr 19, 2017 at 3:31 PM, Marcelo Vanzin <[hidden email]> wrote:
+1 (non-binding).

Ran the hadoop-2.6 binary against our internal tests and things look good.

On Tue, Apr 18, 2017 at 11:59 AM, Michael Armbrust
<[hidden email]> wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.1
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.1-rc3
> (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)
>
> List of JIRA tickets resolved can be found with this filter.
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1230/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/
>
>
> FAQ
>
> How can I help test this release?
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> What should happen to JIRA tickets still targeting 2.1.1?
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>
> But my bug isn't fixed!??!
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.1.0.
>
> What happened to RC1?
>
> There were issues with the release packaging and as a result was skipped.



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [VOTE] Apache Spark 2.1.1 (RC3)

Denny Lee
+1 (non-binding)


On Wed, Apr 19, 2017 at 9:23 PM Dong Joon Hyun <[hidden email]> wrote:
+1

I tested RC3 on CentOS 7.3.1611/OpenJDK 1.8.0_121/R 3.3.3
with `-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver –Psparkr`

At the end of R test, I saw `Had CRAN check errors; see logs.`,
but tests passed and log file looks good.

Bests,
Dongjoon.

From: Reynold Xin <[hidden email]>
Date: Wednesday, April 19, 2017 at 3:41 PM
To: Marcelo Vanzin <[hidden email]>
Cc: Michael Armbrust <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [VOTE] Apache Spark 2.1.1 (RC3)

+1

On Wed, Apr 19, 2017 at 3:31 PM, Marcelo Vanzin <[hidden email]> wrote:
+1 (non-binding).

Ran the hadoop-2.6 binary against our internal tests and things look good.

On Tue, Apr 18, 2017 at 11:59 AM, Michael Armbrust
<[hidden email]> wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.1
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.1-rc3
> (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)
>
> List of JIRA tickets resolved can be found with this filter.
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1230/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/
>
>
> FAQ
>
> How can I help test this release?
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> What should happen to JIRA tickets still targeting 2.1.1?
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>
> But my bug isn't fixed!??!
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.1.0.
>
> What happened to RC1?
>
> There were issues with the release packaging and as a result was skipped.



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [VOTE] Apache Spark 2.1.1 (RC3)

Felix Cheung
Tested on both Linux and Windows, as package.

Found StackOverflowError with ALS on Windows
This is part of the R CRAN check to build the vignettes. Very simple, quick and consistent repo on Windows. The exact same code works fine on Linux. Reproduces the same error with 2.1.1 RC2 but didn't see before because was blocked by a different issue then. 2.1.0 release didn't have ALS R API.

Will convert code to Scala to check and investigate further.

I'm not sure if we would consider this a blocker, but it might block R package release to CRAN.
_____________________________
From: Denny Lee <[hidden email]>
Sent: Wednesday, April 19, 2017 11:00 PM
Subject: Re: [VOTE] Apache Spark 2.1.1 (RC3)
To: Reynold Xin <[hidden email]>, Dong Joon Hyun <[hidden email]>, Marcelo Vanzin <[hidden email]>
Cc: Michael Armbrust <[hidden email]>, <[hidden email]>


+1 (non-binding)


On Wed, Apr 19, 2017 at 9:23 PM Dong Joon Hyun <[hidden email]> wrote:
+1

I tested RC3 on CentOS 7.3.1611/OpenJDK 1.8.0_121/R 3.3.3
with `-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver –Psparkr`

At the end of R test, I saw `Had CRAN check errors; see logs.`,
but tests passed and log file looks good.

Bests,
Dongjoon.

From: Reynold Xin <[hidden email]>
Date: Wednesday, April 19, 2017 at 3:41 PM
To: Marcelo Vanzin <[hidden email]>
Cc: Michael Armbrust <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [VOTE] Apache Spark 2.1.1 (RC3)

+1

On Wed, Apr 19, 2017 at 3:31 PM, Marcelo Vanzin <[hidden email]> wrote:
+1 (non-binding).

Ran the hadoop-2.6 binary against our internal tests and things look good.

On Tue, Apr 18, 2017 at 11:59 AM, Michael Armbrust
<[hidden email]> wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.1
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.1-rc3
> (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)
>
> List of JIRA tickets resolved can be found with this filter.
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1230/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/
>
>
> FAQ
>
> How can I help test this release?
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> What should happen to JIRA tickets still targeting 2.1.1?
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>
> But my bug isn't fixed!??!
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.1.0.
>
> What happened to RC1?
>
> There were issues with the release packaging and as a result was skipped.



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [VOTE] Apache Spark 2.1.1 (RC3)

Adam Roberts
+1 (non-binding), looks good

Tested on RHEL 7.2, 7.3, CentOS 7.2, Ubuntu 14 04 and 16 04, SUSE 12, x86, IBM Linux on Power and IBM Linux on Z (big-endian)

No problems with latest IBM Java, Hadoop 2.7.3 and Scala 2.11.8, no performance concerns to report either (spark-sql-perf and HiBench)

Built with mvn -T 1C -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -DskipTests clean package
Tested with mvn -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -Dtest.exclude.tags=org.apache.spark.tags.DockerTest -fn test





From:        Felix Cheung <[hidden email]>
To:        Reynold Xin <[hidden email]>, Dong Joon Hyun <[hidden email]>, Marcelo Vanzin <[hidden email]>, Denny Lee <[hidden email]>
Cc:        Michael Armbrust <[hidden email]>, "[hidden email]" <[hidden email]>
Date:        20/04/2017 08:13
Subject:        Re: [VOTE] Apache Spark 2.1.1 (RC3)




Tested on both Linux and Windows, as package.

Found StackOverflowError with ALS on Windows
https://issues.apache.org/jira/browse/SPARK-20402
 

This is part of the R CRAN check to build the vignettes. Very simple, quick and consistent repo on Windows. The exact same code works fine on Linux. Reproduces the same error with 2.1.1 RC2 but didn't see before because was blocked by a different issue then. 2.1.0 release didn't have ALS R API.

Will convert code to Scala to check and investigate further.

I'm not sure if we would consider this a blocker, but it might block R package release to CRAN.
_____________________________
From: Denny Lee <
denny.g.lee@...>
Sent: Wednesday, April 19, 2017 11:00 PM
Subject: Re: [VOTE] Apache Spark 2.1.1 (RC3)
To: Reynold Xin <
rxin@...>, Dong Joon Hyun <dhyun@...>, Marcelo Vanzin <vanzin@...>
Cc: Michael Armbrust <
michael@...>, <dev@...>


+1 (non-binding)


On Wed, Apr 19, 2017 at 9:23 PM Dong Joon Hyun <dhyun@...> wrote:
+1

I tested RC3 on CentOS 7.3.1611/OpenJDK 1.8.0_121/R 3.3.3
with `-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver –Psparkr`

At the end of R test, I saw `Had CRAN check errors; see logs.`,
but tests passed and log file looks good.

Bests,
Dongjoon.

From: Reynold Xin <rxin@...>
Date:
Wednesday, April 19, 2017 at 3:41 PM
To:
Marcelo Vanzin <
vanzin@...>
Cc:
Michael Armbrust <
michael@...>, "dev@..." <dev@...>
Subject:
Re: [VOTE] Apache Spark 2.1.1 (RC3)


+1

On Wed, Apr 19, 2017 at 3:31 PM, Marcelo Vanzin <vanzin@...> wrote:
+1 (non-binding).

Ran the hadoop-2.6 binary against our internal tests and things look good.


On Tue, Apr 18, 2017 at 11:59 AM, Michael Armbrust
<
michael@...> wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.1
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see
http://spark.apache.org/
>
> The tag to be voted on is v2.1.1-rc3
> (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)
>
> List of JIRA tickets resolved can be found with this filter.
>
> The release files, including signatures, digests, etc. can be found at:
>
http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/
>
> Release artifacts are signed with the following key:
>
https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
>
https://repository.apache.org/content/repositories/orgapachespark-1230/
>
> The documentation corresponding to this release can be found at:
>
http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/
>
>
> FAQ
>
> How can I help test this release?
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> What should happen to JIRA tickets still targeting 2.1.1?
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>
> But my bug isn't fixed!??!
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.1.0.
>
> What happened to RC1?
>
> There were issues with the release packaging and as a result was skipped.



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail:
[hidden email]






Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [VOTE] Apache Spark 2.1.1 (RC3)

Nicholas Chammas
In reply to this post by Nicholas Chammas
Steve,

I think you're a good person to ask about this. Is the below any cause for concern? Or did I perhaps test this incorrectly?

Nick


On Tue, Apr 18, 2017 at 11:50 PM Nicholas Chammas <[hidden email]> wrote:

I had trouble starting up a shell with the AWS package loaded (specifically, org.apache.hadoop:hadoop-aws:2.7.3):


                [NOT FOUND  ] com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)

        ==== local-m2-cache: tried

          file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar

                [NOT FOUND  ] org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)

        ==== local-m2-cache: tried

          file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar

                [NOT FOUND  ] com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)

        ==== local-m2-cache: tried

          file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar

                ::::::::::::::::::::::::::::::::::::::::::::::

                ::              FAILED DOWNLOADS            ::

                :: ^ see resolution messages for details  ^ ::

                ::::::::::::::::::::::::::::::::::::::::::::::

                :: com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle)

                :: org.codehaus.jettison#jettison;1.1!jettison.jar(bundle)

                :: com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar

                :: com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle)

                ::::::::::::::::::::::::::::::::::::::::::::::

Anyone know anything about this? I made sure to build Spark against the appropriate version of Hadoop.

Nick

On Tue, Apr 18, 2017 at 2:59 PM Michael Armbrust <[hidden email]> wrote:

Please vote on releasing the following candidate as Apache Spark version 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.1.1
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.1.1-rc3 (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.1.1?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.1.2 or 2.2.0.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.0.

What happened to RC1?

There were issues with the release packaging and as a result was skipped.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [VOTE] Apache Spark 2.1.1 (RC3)

Ryan Blue
-1 (non-binding)

Signatures, checksums, and release audit look fine, but there were 3 test failures in spark-hive. I'm attaching the test report.

I'm running Ubuntu 16.04 with Java 1.8.0_121-b13.

Tested with:
JAVA_HOME="/usr/lib/jvm/java-8-oracle/" build/mvn test -Dmaven.javadoc.skip=true -Dskip=true -Dsource.skip=true -P hadoop-2.7 -P hadoop-provided -P hive-provided -P hive-thriftserver -P scala-2.11 -P yarn

Also, SPARK-20202 is outstanding without a path forward on patch releases. Is the Hive community okay with patch releases continuing to depend on an unofficial fork of Hive? From reading comments on the issue, this is still unclear. HIVE-16391 was opened for a release of Hive 1.2.1 that Spark can depend on, but it doesn't look like any progress has been made to resolve it.

rb

On Thu, Apr 20, 2017 at 6:59 AM, Nicholas Chammas <[hidden email]> wrote:
Steve,

I think you're a good person to ask about this. Is the below any cause for concern? Or did I perhaps test this incorrectly?

Nick


On Tue, Apr 18, 2017 at 11:50 PM Nicholas Chammas <[hidden email]> wrote:

I had trouble starting up a shell with the AWS package loaded (specifically, org.apache.hadoop:hadoop-aws:2.7.3):


                [NOT FOUND  ] com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)

        ==== local-m2-cache: tried

          file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar

                [NOT FOUND  ] org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)

        ==== local-m2-cache: tried

          file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar

                [NOT FOUND  ] com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)

        ==== local-m2-cache: tried

          file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar

                ::::::::::::::::::::::::::::::::::::::::::::::

                ::              FAILED DOWNLOADS            ::

                :: ^ see resolution messages for details  ^ ::

                ::::::::::::::::::::::::::::::::::::::::::::::

                :: com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle)

                :: org.codehaus.jettison#jettison;1.1!jettison.jar(bundle)

                :: com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar

                :: com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle)

                ::::::::::::::::::::::::::::::::::::::::::::::

Anyone know anything about this? I made sure to build Spark against the appropriate version of Hadoop.

Nick

On Tue, Apr 18, 2017 at 2:59 PM Michael Armbrust <[hidden email]> wrote:

Please vote on releasing the following candidate as Apache Spark version 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.1.1
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.1.1-rc3 (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.1.1?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.1.2 or 2.2.0.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.0.

What happened to RC1?

There were issues with the release packaging and as a result was skipped.




--
Ryan Blue
Software Engineer
Netflix


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [VOTE] Apache Spark 2.1.1 (RC3)

Michael Allman-2
In reply to this post by Michael Armbrust
I want to caution that in testing a build from this morning's branch-2.1 we found that Hive partition pruning was not working. We found that Spark SQL was fetching all Hive table partitions for a very simple query whereas in a build from several weeks ago it was fetching only the required partitions. I cannot currently think of a reason for the regression outside of some difference between branch-2.1 from our previous build and branch-2.1 from this morning.

That's all I know right now. We are actively investigating to find the root cause of this problem, and specifically whether this is a problem in the Spark codebase or not. I will report back when I have an answer to that question.

Michael


On Apr 18, 2017, at 11:59 AM, Michael Armbrust <[hidden email]> wrote:

Please vote on releasing the following candidate as Apache Spark version 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.1.1
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.1.1-rc3 (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.1.1?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.1.2 or 2.2.0.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.0.

What happened to RC1?

There were issues with the release packaging and as a result was skipped.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [VOTE] Apache Spark 2.1.1 (RC3)

Michael Allman-2
We've identified the cause of the change in behavior. It is related to the SQL conf key "spark.sql.hive.caseSensitiveInferenceMode". This key and its related functionality was absent from our previous build. The default setting in the current build was causing Spark to attempt to scan all table files during query analysis. Changing this setting to NEVER_INFER disabled this operation and resolved the issue we had.

Michael


On Apr 20, 2017, at 3:42 PM, Michael Allman <[hidden email]> wrote:

I want to caution that in testing a build from this morning's branch-2.1 we found that Hive partition pruning was not working. We found that Spark SQL was fetching all Hive table partitions for a very simple query whereas in a build from several weeks ago it was fetching only the required partitions. I cannot currently think of a reason for the regression outside of some difference between branch-2.1 from our previous build and branch-2.1 from this morning.

That's all I know right now. We are actively investigating to find the root cause of this problem, and specifically whether this is a problem in the Spark codebase or not. I will report back when I have an answer to that question.

Michael


On Apr 18, 2017, at 11:59 AM, Michael Armbrust <[hidden email]> wrote:

Please vote on releasing the following candidate as Apache Spark version 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.1.1
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.1.1-rc3 (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.1.1?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.1.2 or 2.2.0.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.0.

What happened to RC1?

There were issues with the release packaging and as a result was skipped.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [VOTE] Apache Spark 2.1.1 (RC3)

Michael Armbrust
Thanks for pointing this out, Michael.  Based on the conversation on the PR this seems like a risky change to include in a release branch with a default other than NEVER_INFER.

+Wenchen?  What do you think?

On Thu, Apr 20, 2017 at 4:14 PM, Michael Allman <[hidden email]> wrote:
We've identified the cause of the change in behavior. It is related to the SQL conf key "spark.sql.hive.caseSensitiveInferenceMode". This key and its related functionality was absent from our previous build. The default setting in the current build was causing Spark to attempt to scan all table files during query analysis. Changing this setting to NEVER_INFER disabled this operation and resolved the issue we had.

Michael


On Apr 20, 2017, at 3:42 PM, Michael Allman <[hidden email]> wrote:

I want to caution that in testing a build from this morning's branch-2.1 we found that Hive partition pruning was not working. We found that Spark SQL was fetching all Hive table partitions for a very simple query whereas in a build from several weeks ago it was fetching only the required partitions. I cannot currently think of a reason for the regression outside of some difference between branch-2.1 from our previous build and branch-2.1 from this morning.

That's all I know right now. We are actively investigating to find the root cause of this problem, and specifically whether this is a problem in the Spark codebase or not. I will report back when I have an answer to that question.

Michael


On Apr 18, 2017, at 11:59 AM, Michael Armbrust <[hidden email]> wrote:

Please vote on releasing the following candidate as Apache Spark version 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.1.1
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.1.1-rc3 (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.1.1?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.1.2 or 2.2.0.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.0.

What happened to RC1?

There were issues with the release packaging and as a result was skipped.



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [VOTE] Apache Spark 2.1.1 (RC3)

Wenchen Fan
IIRC, the new "spark.sql.hive.caseSensitiveInferenceMode" stuff will only scan all table files only once, and write back the inferred schema to metastore so that we don't need to do the schema inference again.

So technically this will introduce a performance regression for the first query, but compared to branch-2.0, it's not performance regression. And this patch fixed a regression in branch-2.1, which can run in branch-2.0. Personally, I think we should keep INFER_AND_SAVE as the default mode.

+ [Eric], what do you think?

On Sat, Apr 22, 2017 at 1:37 AM, Michael Armbrust <[hidden email]> wrote:
Thanks for pointing this out, Michael.  Based on the conversation on the PR this seems like a risky change to include in a release branch with a default other than NEVER_INFER.

+Wenchen?  What do you think?

On Thu, Apr 20, 2017 at 4:14 PM, Michael Allman <[hidden email]> wrote:
We've identified the cause of the change in behavior. It is related to the SQL conf key "spark.sql.hive.caseSensitiveInferenceMode". This key and its related functionality was absent from our previous build. The default setting in the current build was causing Spark to attempt to scan all table files during query analysis. Changing this setting to NEVER_INFER disabled this operation and resolved the issue we had.

Michael


On Apr 20, 2017, at 3:42 PM, Michael Allman <[hidden email]> wrote:

I want to caution that in testing a build from this morning's branch-2.1 we found that Hive partition pruning was not working. We found that Spark SQL was fetching all Hive table partitions for a very simple query whereas in a build from several weeks ago it was fetching only the required partitions. I cannot currently think of a reason for the regression outside of some difference between branch-2.1 from our previous build and branch-2.1 from this morning.

That's all I know right now. We are actively investigating to find the root cause of this problem, and specifically whether this is a problem in the Spark codebase or not. I will report back when I have an answer to that question.

Michael


On Apr 18, 2017, at 11:59 AM, Michael Armbrust <[hidden email]> wrote:

Please vote on releasing the following candidate as Apache Spark version 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.1.1
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.1.1-rc3 (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.1.1?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.1.2 or 2.2.0.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.0.

What happened to RC1?

There were issues with the release packaging and as a result was skipped.




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [VOTE] Apache Spark 2.1.1 (RC3)

Holden Karau
Whats the regression this fixed in 2.1 from 2.0?

On Fri, Apr 21, 2017 at 7:45 PM, Wenchen Fan <[hidden email]> wrote:
IIRC, the new "spark.sql.hive.caseSensitiveInferenceMode" stuff will only scan all table files only once, and write back the inferred schema to metastore so that we don't need to do the schema inference again.

So technically this will introduce a performance regression for the first query, but compared to branch-2.0, it's not performance regression. And this patch fixed a regression in branch-2.1, which can run in branch-2.0. Personally, I think we should keep INFER_AND_SAVE as the default mode.

+ [Eric], what do you think?

On Sat, Apr 22, 2017 at 1:37 AM, Michael Armbrust <[hidden email]> wrote:
Thanks for pointing this out, Michael.  Based on the conversation on the PR this seems like a risky change to include in a release branch with a default other than NEVER_INFER.

+Wenchen?  What do you think?

On Thu, Apr 20, 2017 at 4:14 PM, Michael Allman <[hidden email]> wrote:
We've identified the cause of the change in behavior. It is related to the SQL conf key "spark.sql.hive.caseSensitiveInferenceMode". This key and its related functionality was absent from our previous build. The default setting in the current build was causing Spark to attempt to scan all table files during query analysis. Changing this setting to NEVER_INFER disabled this operation and resolved the issue we had.

Michael


On Apr 20, 2017, at 3:42 PM, Michael Allman <[hidden email]> wrote:

I want to caution that in testing a build from this morning's branch-2.1 we found that Hive partition pruning was not working. We found that Spark SQL was fetching all Hive table partitions for a very simple query whereas in a build from several weeks ago it was fetching only the required partitions. I cannot currently think of a reason for the regression outside of some difference between branch-2.1 from our previous build and branch-2.1 from this morning.

That's all I know right now. We are actively investigating to find the root cause of this problem, and specifically whether this is a problem in the Spark codebase or not. I will report back when I have an answer to that question.

Michael


On Apr 18, 2017, at 11:59 AM, Michael Armbrust <[hidden email]> wrote:

Please vote on releasing the following candidate as Apache Spark version 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.1.1
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.1.1-rc3 (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.1.1?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.1.2 or 2.2.0.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.0.

What happened to RC1?

There were issues with the release packaging and as a result was skipped.







--
Cell : 425-233-8271
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [VOTE] Apache Spark 2.1.1 (RC3)

Wenchen Fan

On Mon, Apr 24, 2017 at 2:22 PM, Holden Karau <[hidden email]> wrote:
Whats the regression this fixed in 2.1 from 2.0?

On Fri, Apr 21, 2017 at 7:45 PM, Wenchen Fan <[hidden email]> wrote:
IIRC, the new "spark.sql.hive.caseSensitiveInferenceMode" stuff will only scan all table files only once, and write back the inferred schema to metastore so that we don't need to do the schema inference again.

So technically this will introduce a performance regression for the first query, but compared to branch-2.0, it's not performance regression. And this patch fixed a regression in branch-2.1, which can run in branch-2.0. Personally, I think we should keep INFER_AND_SAVE as the default mode.

+ [Eric], what do you think?

On Sat, Apr 22, 2017 at 1:37 AM, Michael Armbrust <[hidden email]> wrote:
Thanks for pointing this out, Michael.  Based on the conversation on the PR this seems like a risky change to include in a release branch with a default other than NEVER_INFER.

+Wenchen?  What do you think?

On Thu, Apr 20, 2017 at 4:14 PM, Michael Allman <[hidden email]> wrote:
We've identified the cause of the change in behavior. It is related to the SQL conf key "spark.sql.hive.caseSensitiveInferenceMode". This key and its related functionality was absent from our previous build. The default setting in the current build was causing Spark to attempt to scan all table files during query analysis. Changing this setting to NEVER_INFER disabled this operation and resolved the issue we had.

Michael


On Apr 20, 2017, at 3:42 PM, Michael Allman <[hidden email]> wrote:

I want to caution that in testing a build from this morning's branch-2.1 we found that Hive partition pruning was not working. We found that Spark SQL was fetching all Hive table partitions for a very simple query whereas in a build from several weeks ago it was fetching only the required partitions. I cannot currently think of a reason for the regression outside of some difference between branch-2.1 from our previous build and branch-2.1 from this morning.

That's all I know right now. We are actively investigating to find the root cause of this problem, and specifically whether this is a problem in the Spark codebase or not. I will report back when I have an answer to that question.

Michael


On Apr 18, 2017, at 11:59 AM, Michael Armbrust <[hidden email]> wrote:

Please vote on releasing the following candidate as Apache Spark version 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.1.1
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.1.1-rc3 (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.1.1?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.1.2 or 2.2.0.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.0.

What happened to RC1?

There were issues with the release packaging and as a result was skipped.







--
Cell : <a href="tel:(425)%20233-8271" value="+14252338271" target="_blank">425-233-8271

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [VOTE] Apache Spark 2.1.1 (RC3)

Michael Allman-2
The trouble we ran into is that this upgrade was blocking access to our tables, and we didn't know why. This sounds like a kind of migration operation, but it was not apparent that this was the case. It took an expert examining a stack trace and source code to figure this out. Would a more naive end user be able to debug this issue? Maybe we're an unusual case, but our particular experience was pretty bad. I have my doubts that the schema inference on our largest tables would ever complete without throwing some kind of timeout (which we were in fact receiving) or the end user just giving up and killing our job. We ended up doing a rollback while we investigated the source of the issue. In our case, INFER_NEVER is clearly the best configuration. We're going to add that to our default configuration files.

My expectation is that a minor point release is a pretty safe bug fix release. We were a bit hasty in not doing better due diligence pre-upgrade.

One suggestion the Spark team might consider is releasing 2.1.1 with INVER_NEVER and 2.2.0 with INFER_AND_SAVE. Clearly some kind of up-front migration notes would help in identifying this new behavior in 2.2.

Thanks,

Michael


On Apr 24, 2017, at 2:09 AM, Wenchen Fan <[hidden email]> wrote:


On Mon, Apr 24, 2017 at 2:22 PM, Holden Karau <[hidden email]> wrote:
Whats the regression this fixed in 2.1 from 2.0?

On Fri, Apr 21, 2017 at 7:45 PM, Wenchen Fan <[hidden email]> wrote:
IIRC, the new "spark.sql.hive.caseSensitiveInferenceMode" stuff will only scan all table files only once, and write back the inferred schema to metastore so that we don't need to do the schema inference again.

So technically this will introduce a performance regression for the first query, but compared to branch-2.0, it's not performance regression. And this patch fixed a regression in branch-2.1, which can run in branch-2.0. Personally, I think we should keep INFER_AND_SAVE as the default mode.

+ [Eric], what do you think?

On Sat, Apr 22, 2017 at 1:37 AM, Michael Armbrust <[hidden email]> wrote:
Thanks for pointing this out, Michael.  Based on the conversation on the PR this seems like a risky change to include in a release branch with a default other than NEVER_INFER.

+Wenchen?  What do you think?

On Thu, Apr 20, 2017 at 4:14 PM, Michael Allman <[hidden email]> wrote:
We've identified the cause of the change in behavior. It is related to the SQL conf key "spark.sql.hive.caseSensitiveInferenceMode". This key and its related functionality was absent from our previous build. The default setting in the current build was causing Spark to attempt to scan all table files during query analysis. Changing this setting to NEVER_INFER disabled this operation and resolved the issue we had.

Michael


On Apr 20, 2017, at 3:42 PM, Michael Allman <[hidden email]> wrote:

I want to caution that in testing a build from this morning's branch-2.1 we found that Hive partition pruning was not working. We found that Spark SQL was fetching all Hive table partitions for a very simple query whereas in a build from several weeks ago it was fetching only the required partitions. I cannot currently think of a reason for the regression outside of some difference between branch-2.1 from our previous build and branch-2.1 from this morning.

That's all I know right now. We are actively investigating to find the root cause of this problem, and specifically whether this is a problem in the Spark codebase or not. I will report back when I have an answer to that question.

Michael


On Apr 18, 2017, at 11:59 AM, Michael Armbrust <[hidden email]> wrote:

Please vote on releasing the following candidate as Apache Spark version 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.1.1
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.1.1-rc3 (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.1.1?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.1.2 or 2.2.0.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.0.

What happened to RC1?

There were issues with the release packaging and as a result was skipped.







--
Cell : <a href="tel:(425)%20233-8271" value="+14252338271" target="_blank" class="">425-233-8271


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [VOTE] Apache Spark 2.1.1 (RC3)

Holden Karau
It 

On Mon, Apr 24, 2017 at 10:33 AM, Michael Allman <[hidden email]> wrote:
The trouble we ran into is that this upgrade was blocking access to our tables, and we didn't know why. This sounds like a kind of migration operation, but it was not apparent that this was the case. It took an expert examining a stack trace and source code to figure this out. Would a more naive end user be able to debug this issue? Maybe we're an unusual case, but our particular experience was pretty bad. I have my doubts that the schema inference on our largest tables would ever complete without throwing some kind of timeout (which we were in fact receiving) or the end user just giving up and killing our job. We ended up doing a rollback while we investigated the source of the issue. In our case, INFER_NEVER is clearly the best configuration. We're going to add that to our default configuration files.

My expectation is that a minor point release is a pretty safe bug fix release. We were a bit hasty in not doing better due diligence pre-upgrade.

One suggestion the Spark team might consider is releasing 2.1.1 with INVER_NEVER and 2.2.0 with INFER_AND_SAVE. Clearly some kind of up-front migration notes would help in identifying this new behavior in 2.2.

Thanks,

Michael


On Apr 24, 2017, at 2:09 AM, Wenchen Fan <[hidden email]> wrote:


On Mon, Apr 24, 2017 at 2:22 PM, Holden Karau <[hidden email]> wrote:
Whats the regression this fixed in 2.1 from 2.0?

On Fri, Apr 21, 2017 at 7:45 PM, Wenchen Fan <[hidden email]> wrote:
IIRC, the new "spark.sql.hive.caseSensitiveInferenceMode" stuff will only scan all table files only once, and write back the inferred schema to metastore so that we don't need to do the schema inference again.

So technically this will introduce a performance regression for the first query, but compared to branch-2.0, it's not performance regression. And this patch fixed a regression in branch-2.1, which can run in branch-2.0. Personally, I think we should keep INFER_AND_SAVE as the default mode.

+ [Eric], what do you think?

On Sat, Apr 22, 2017 at 1:37 AM, Michael Armbrust <[hidden email]> wrote:
Thanks for pointing this out, Michael.  Based on the conversation on the PR this seems like a risky change to include in a release branch with a default other than NEVER_INFER.

+Wenchen?  What do you think?

On Thu, Apr 20, 2017 at 4:14 PM, Michael Allman <[hidden email]> wrote:
We've identified the cause of the change in behavior. It is related to the SQL conf key "spark.sql.hive.caseSensitiveInferenceMode". This key and its related functionality was absent from our previous build. The default setting in the current build was causing Spark to attempt to scan all table files during query analysis. Changing this setting to NEVER_INFER disabled this operation and resolved the issue we had.

Michael


On Apr 20, 2017, at 3:42 PM, Michael Allman <[hidden email]> wrote:

I want to caution that in testing a build from this morning's branch-2.1 we found that Hive partition pruning was not working. We found that Spark SQL was fetching all Hive table partitions for a very simple query whereas in a build from several weeks ago it was fetching only the required partitions. I cannot currently think of a reason for the regression outside of some difference between branch-2.1 from our previous build and branch-2.1 from this morning.

That's all I know right now. We are actively investigating to find the root cause of this problem, and specifically whether this is a problem in the Spark codebase or not. I will report back when I have an answer to that question.

Michael


On Apr 18, 2017, at 11:59 AM, Michael Armbrust <[hidden email]> wrote:

Please vote on releasing the following candidate as Apache Spark version 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.1.1
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.1.1-rc3 (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.1.1?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.1.2 or 2.2.0.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.0.

What happened to RC1?

There were issues with the release packaging and as a result was skipped.







--
Cell : <a href="tel:(425)%20233-8271" value="+14252338271" target="_blank">425-233-8271





--
Cell : 425-233-8271
12
Loading...