[VOTE] Apache Spark 2.2.0 (RC1)

classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[VOTE] Apache Spark 2.2.0 (RC1)

Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.0-rc1 (8ccb4a57c82146c1a8f8966c7e64010cf5632cb6)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.1.
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Spark 2.2.0 (RC1)

Sean Owen
These are still blockers for 2.2:

SPARK-20501 ML, Graph 2.2 QA: API: New Scala APIs, docs
SPARK-20504 ML 2.2 QA: API: Java compatibility, docs
SPARK-20503 ML 2.2 QA: API: Python API coverage
SPARK-20502 ML, Graph 2.2 QA: API: Experimental, DeveloperApi, final, sealed audit
SPARK-20500 ML, Graph 2.2 QA: API: Binary incompatible changes
SPARK-18813 MLlib 2.2 Roadmap

Joseph you opened most of these just now. Is this an "RC0" we know won't pass? or, wouldn't we normally cut an RC after those things are ready?

On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.0-rc1 (8ccb4a57c82146c1a8f8966c7e64010cf5632cb6)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.1.
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Spark 2.2.0 (RC1)

Michael Armbrust
All of those look like QA or documentation, which I don't think needs to block testing on an RC (and in fact probably needs an RC to test?).  Joseph, please correct me if I'm wrong.  It is unlikely this first RC is going to pass, but I wanted to get the ball rolling on testing 2.2.

On Thu, Apr 27, 2017 at 1:45 PM, Sean Owen <[hidden email]> wrote:
These are still blockers for 2.2:

SPARK-20501 ML, Graph 2.2 QA: API: New Scala APIs, docs
SPARK-20504 ML 2.2 QA: API: Java compatibility, docs
SPARK-20503 ML 2.2 QA: API: Python API coverage
SPARK-20502 ML, Graph 2.2 QA: API: Experimental, DeveloperApi, final, sealed audit
SPARK-20500 ML, Graph 2.2 QA: API: Binary incompatible changes
SPARK-18813 MLlib 2.2 Roadmap

Joseph you opened most of these just now. Is this an "RC0" we know won't pass? or, wouldn't we normally cut an RC after those things are ready?

On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.0-rc1 (8ccb4a57c82146c1a8f8966c7e64010cf5632cb6)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.1.

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Spark 2.2.0 (RC1)

Joseph Bradley
This is the same thing as ever for MLlib: Once a branch has been cut, we stop merging features.  Now that features are not being merged, we can begin QA.  I strongly prefer to track QA work in JIRA and to have those items targeted for 2.2.  I also believe that certain QA tasks should be blockers; e.g., if we have not checked for binary or Java compatibility issues in new APIs, then I am not comfortable signing off on a release.  I agree with Michael that these don't block testing on a release; the point of these issues is to do testing.

I'll close the roadmap JIRA though.

On Thu, Apr 27, 2017 at 1:49 PM, Michael Armbrust <[hidden email]> wrote:
All of those look like QA or documentation, which I don't think needs to block testing on an RC (and in fact probably needs an RC to test?).  Joseph, please correct me if I'm wrong.  It is unlikely this first RC is going to pass, but I wanted to get the ball rolling on testing 2.2.

On Thu, Apr 27, 2017 at 1:45 PM, Sean Owen <[hidden email]> wrote:
These are still blockers for 2.2:

SPARK-20501 ML, Graph 2.2 QA: API: New Scala APIs, docs
SPARK-20504 ML 2.2 QA: API: Java compatibility, docs
SPARK-20503 ML 2.2 QA: API: Python API coverage
SPARK-20502 ML, Graph 2.2 QA: API: Experimental, DeveloperApi, final, sealed audit
SPARK-20500 ML, Graph 2.2 QA: API: Binary incompatible changes
SPARK-18813 MLlib 2.2 Roadmap

Joseph you opened most of these just now. Is this an "RC0" we know won't pass? or, wouldn't we normally cut an RC after those things are ready?

On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.0-rc1 (8ccb4a57c82146c1a8f8966c7e64010cf5632cb6)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.1.




--

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

http://databricks.com

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Spark 2.2.0 (RC1)

Sean Owen
That makes sense, but we have an RC, not just a branch. I think we've followed the pattern in http://spark.apache.org/versioning-policy.html in the past. This generally comes before and RC right, because until everything that Must Happen before a release has happened, someone's saying the RC can't possibly pass. I get it, in practice, this is an "RC0" that can't pass (unless somehow these issue result in zero changes) and there's value in that anyway. Just want to see if we're on the same page about process, maybe even just say this is how we manage releases, with "RCs" starting before QA ends.

On Thu, Apr 27, 2017 at 10:36 PM Joseph Bradley <[hidden email]> wrote:
This is the same thing as ever for MLlib: Once a branch has been cut, we stop merging features.  Now that features are not being merged, we can begin QA.  I strongly prefer to track QA work in JIRA and to have those items targeted for 2.2.  I also believe that certain QA tasks should be blockers; e.g., if we have not checked for binary or Java compatibility issues in new APIs, then I am not comfortable signing off on a release.  I agree with Michael that these don't block testing on a release; the point of these issues is to do testing.

I'll close the roadmap JIRA though.

On Thu, Apr 27, 2017 at 1:49 PM, Michael Armbrust <[hidden email]> wrote:
All of those look like QA or documentation, which I don't think needs to block testing on an RC (and in fact probably needs an RC to test?).  Joseph, please correct me if I'm wrong.  It is unlikely this first RC is going to pass, but I wanted to get the ball rolling on testing 2.2.

On Thu, Apr 27, 2017 at 1:45 PM, Sean Owen <[hidden email]> wrote:
These are still blockers for 2.2:

SPARK-20501 ML, Graph 2.2 QA: API: New Scala APIs, docs
SPARK-20504 ML 2.2 QA: API: Java compatibility, docs
SPARK-20503 ML 2.2 QA: API: Python API coverage
SPARK-20502 ML, Graph 2.2 QA: API: Experimental, DeveloperApi, final, sealed audit
SPARK-20500 ML, Graph 2.2 QA: API: Binary incompatible changes
SPARK-18813 MLlib 2.2 Roadmap

Joseph you opened most of these just now. Is this an "RC0" we know won't pass? or, wouldn't we normally cut an RC after those things are ready?

On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.0-rc1 (8ccb4a57c82146c1a8f8966c7e64010cf5632cb6)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.1.




--

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

http://databricks.com

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Spark 2.2.0 (RC1)

Joseph Bradley
That's very fair.

For my part, I should have been faster to make these JIRAs and get critical dev community QA started when the branch was cut last week.

On Thu, Apr 27, 2017 at 2:59 PM, Sean Owen <[hidden email]> wrote:
That makes sense, but we have an RC, not just a branch. I think we've followed the pattern in http://spark.apache.org/versioning-policy.html in the past. This generally comes before and RC right, because until everything that Must Happen before a release has happened, someone's saying the RC can't possibly pass. I get it, in practice, this is an "RC0" that can't pass (unless somehow these issue result in zero changes) and there's value in that anyway. Just want to see if we're on the same page about process, maybe even just say this is how we manage releases, with "RCs" starting before QA ends.

On Thu, Apr 27, 2017 at 10:36 PM Joseph Bradley <[hidden email]> wrote:
This is the same thing as ever for MLlib: Once a branch has been cut, we stop merging features.  Now that features are not being merged, we can begin QA.  I strongly prefer to track QA work in JIRA and to have those items targeted for 2.2.  I also believe that certain QA tasks should be blockers; e.g., if we have not checked for binary or Java compatibility issues in new APIs, then I am not comfortable signing off on a release.  I agree with Michael that these don't block testing on a release; the point of these issues is to do testing.

I'll close the roadmap JIRA though.

On Thu, Apr 27, 2017 at 1:49 PM, Michael Armbrust <[hidden email]> wrote:
All of those look like QA or documentation, which I don't think needs to block testing on an RC (and in fact probably needs an RC to test?).  Joseph, please correct me if I'm wrong.  It is unlikely this first RC is going to pass, but I wanted to get the ball rolling on testing 2.2.

On Thu, Apr 27, 2017 at 1:45 PM, Sean Owen <[hidden email]> wrote:
These are still blockers for 2.2:

SPARK-20501 ML, Graph 2.2 QA: API: New Scala APIs, docs
SPARK-20504 ML 2.2 QA: API: Java compatibility, docs
SPARK-20503 ML 2.2 QA: API: Python API coverage
SPARK-20502 ML, Graph 2.2 QA: API: Experimental, DeveloperApi, final, sealed audit
SPARK-20500 ML, Graph 2.2 QA: API: Binary incompatible changes
SPARK-18813 MLlib 2.2 Roadmap

Joseph you opened most of these just now. Is this an "RC0" we know won't pass? or, wouldn't we normally cut an RC after those things are ready?

On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.0-rc1 (8ccb4a57c82146c1a8f8966c7e64010cf5632cb6)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.1.




--

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

http://databricks.com




--

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

http://databricks.com

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Spark 2.2.0 (RC1)

Sean Owen
In reply to this post by Michael Armbrust
By the way the RC looks good. Sigs and license are OK, tests pass with -Phive -Pyarn -Phadoop-2.7. +1 from me.

On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.0-rc1 (8ccb4a57c82146c1a8f8966c7e64010cf5632cb6)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.1.
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Spark 2.2.0 (RC1)

Andrew Ash
-1 due to regression from 2.1.1

In 2.2.0-rc1 we bumped the Parquet version from 1.8.1 to 1.8.2 in commit 26a4cba3ff.  Parquet 1.8.2 includes a backport from 1.9.0: PARQUET-389 in commit 2282c22c

This backport caused a regression in Spark, where filtering on columns containing dots in the column name pushes the filter down into Parquet where Parquet incorrectly handles the predicate.  Spark pushes the String "col.dots" as the column name, but Parquet interprets this as "struct.field" where the predicate is on a field of a struct.  The ultimate result is that the predicate always returns zero results, causing a data correctness issue.

This issue is filed in Spark as SPARK-20364 and has a PR fix up at PR #17680.

I nominate SPARK-20364 as a release blocker due to the data correctness regression.

Thanks!
Andrew

On Thu, Apr 27, 2017 at 6:49 PM, Sean Owen <[hidden email]> wrote:
By the way the RC looks good. Sigs and license are OK, tests pass with -Phive -Pyarn -Phadoop-2.7. +1 from me.

On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.0-rc1 (8ccb4a57c82146c1a8f8966c7e64010cf5632cb6)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.1.

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Spark 2.2.0 (RC1)

Koert Kuipers
this is column names containing dots that do not target fields inside structs? so not a.b as in field b inside struct a, but somehow a field called a.b? i didnt even know it is supported at all. its something i would never try because it sounds like a bad idea to go there...

On Fri, Apr 28, 2017 at 12:17 PM, Andrew Ash <[hidden email]> wrote:
-1 due to regression from 2.1.1

In 2.2.0-rc1 we bumped the Parquet version from 1.8.1 to 1.8.2 in commit 26a4cba3ff.  Parquet 1.8.2 includes a backport from 1.9.0: PARQUET-389 in commit 2282c22c

This backport caused a regression in Spark, where filtering on columns containing dots in the column name pushes the filter down into Parquet where Parquet incorrectly handles the predicate.  Spark pushes the String "col.dots" as the column name, but Parquet interprets this as "struct.field" where the predicate is on a field of a struct.  The ultimate result is that the predicate always returns zero results, causing a data correctness issue.

This issue is filed in Spark as SPARK-20364 and has a PR fix up at PR #17680.

I nominate SPARK-20364 as a release blocker due to the data correctness regression.

Thanks!
Andrew

On Thu, Apr 27, 2017 at 6:49 PM, Sean Owen <[hidden email]> wrote:
By the way the RC looks good. Sigs and license are OK, tests pass with -Phive -Pyarn -Phadoop-2.7. +1 from me.

On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.0-rc1 (8ccb4a57c82146c1a8f8966c7e64010cf5632cb6)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.1.


Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Spark 2.2.0 (RC1)

Kazuaki Ishizaki
In reply to this post by Michael Armbrust
+1 (non-binding)

I tested it on Ubuntu 16.04 and OpenJDK8 on ppc64le. All of the tests for core have passed..

$ java -version
openjdk version "1.8.0_111"
OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14)
OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
$ build/mvn -DskipTests -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7 package install
$ build/mvn -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7 test -pl core
...
Run completed in 15 minutes, 45 seconds.
Total number of tests run: 1937
Suites: completed 205, aborted 0
Tests: succeeded 1937, failed 0, canceled 4, ignored 8, pending 0
All tests passed.
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 17:26 min
[INFO] Finished at: 2017-04-29T02:23:08+09:00
[INFO] Final Memory: 53M/491M
[INFO] ------------------------------------------------------------------------

Kazuaki Ishizaki,



From:        Michael Armbrust <[hidden email]>
To:        "[hidden email]" <[hidden email]>
Date:        2017/04/28 03:32
Subject:        [VOTE] Apache Spark 2.2.0 (RC1)




Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.0-rc1 (8ccb4a57c82146c1a8f8966c7e64010cf5632cb6)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:
http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1235/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-docs/


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.1.

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Spark 2.2.0 (RC1)

Koert Kuipers
In reply to this post by Sean Owen
we have been testing the 2.2.0 snapshots in the last few weeks for inhouse unit tests, integration tests and real workloads and we are very happy with it. the only issue i had so far (some encoders not being serialize anymore) has already been dealt with by wenchen.

On Thu, Apr 27, 2017 at 6:49 PM, Sean Owen <[hidden email]> wrote:
By the way the RC looks good. Sigs and license are OK, tests pass with -Phive -Pyarn -Phadoop-2.7. +1 from me.

On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.0-rc1 (8ccb4a57c82146c1a8f8966c7e64010cf5632cb6)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.1.

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Spark 2.2.0 (RC1)

Hyukjin Kwon
SPARK-20364 describes a bug but I am unclear that we should call it a regression that blocks a release.

It is something working incorrectly (in some cases in terms of output) but this case looks not even working so far in the past releases.

The current master produces a wrong result when there are dots in column names for Parquet in some cases, which did even work in past releases.


So, this looks not a regression to me although it is a bug that definitely we should fix.


In more details, I tested this cases as below:


Spark 1.6.3
val path = "/tmp/foo"
Seq(Tuple1(Some(1)), Tuple1(None)).toDF("col.dots").write.parquet(path)
sqlContext.read.parquet(path).where("`col.dots` IS NOT NULL").show()
java.lang.IllegalArgumentException: Column [col, dots] was not found in schema!
    at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
    ...
sqlContext.read.parquet(path).where("`col.dots` IS NULL").show()
java.lang.IllegalArgumentException: Column [col, dots] was not found in schema!
    at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
    ...


Spark 2.0.2

val path = "/tmp/foo"
Seq(Some(1), None).toDF("col.dots").write.parquet(path)
spark.read.parquet(path).where("`col.dots` IS NOT NULL").show()
java.lang.IllegalArgumentException: Column [col, dots] was not found in schema!
    at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
    ...
spark.read.parquet(path).where("`col.dots` IS NULL").show()
java.lang.IllegalArgumentException: Column [col, dots] was not found in schema!
    at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
    ...


Spark 2.1.0

val path = "/tmp/foo"
Seq(Some(1), None).toDF("col.dots").write.parquet(path)
spark.read.parquet(path).where("`col.dots` IS NOT NULL").show()
java.lang.IllegalArgumentException: Column [col, dots] was not found in schema!
    at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
    ...
spark.read.parquet(path).where("`col.dots` IS NULL").show()
java.lang.IllegalArgumentException: Column [col, dots] was not found in schema!
    at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
    ...


Spark 2.1.1 RC4

val path = "/tmp/foo"
Seq(Some(1), None).toDF("col.dots").write.parquet(path)
spark.read.parquet(path).where("`col.dots` IS NOT NULL").show()
java.lang.IllegalArgumentException: Column [col, dots] was not found in schema!
    at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
    ...
spark.read.parquet(path).where("`col.dots` IS NULL").show()
java.lang.IllegalArgumentException: Column [col, dots] was not found in schema!
    at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
    ...


Current master

val path = "/tmp/foo"
Seq(Some(1), None).toDF("col.dots").write.parquet(path)
spark.read.parquet(path).where("`col.dots` IS NOT NULL").show()
+--------+
|col.dots|
+--------+
+--------+
spark.read.parquet(path).where("`col.dots` IS NULL").show()
+--------+
|col.dots|
+--------+
|    null|
+--------+


2017-04-29 2:57 GMT+09:00 Koert Kuipers <[hidden email]>:
we have been testing the 2.2.0 snapshots in the last few weeks for inhouse unit tests, integration tests and real workloads and we are very happy with it. the only issue i had so far (some encoders not being serialize anymore) has already been dealt with by wenchen.

On Thu, Apr 27, 2017 at 6:49 PM, Sean Owen <[hidden email]> wrote:
By the way the RC looks good. Sigs and license are OK, tests pass with -Phive -Pyarn -Phadoop-2.7. +1 from me.

On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.0-rc1 (8ccb4a57c82146c1a8f8966c7e64010cf5632cb6)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.1.


Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Spark 2.2.0 (RC1)

zero323
In reply to this post by Sean Owen

I am not sure if it is relevant but explode_outer and posexplode_outer seem to be broken: SPARK-20534


On 04/28/2017 12:49 AM, Sean Owen wrote:
By the way the RC looks good. Sigs and license are OK, tests pass with -Phive -Pyarn -Phadoop-2.7. +1 from me.

On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.0-rc1 (8ccb4a57c82146c1a8f8966c7e64010cf5632cb6)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.1.

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Spark 2.2.0 (RC1)

Herman van Hövell tot Westerflier-2
Maciej, this is definitely a bug. I have opened https://github.com/apache/spark/pull/17810 to fix this. I don't think this should be a blocker for the release of 2.2, if there is another RC we will include it.

On Sat, Apr 29, 2017 at 10:17 AM, Maciej Szymkiewicz <[hidden email]> wrote:

I am not sure if it is relevant but explode_outer and posexplode_outer seem to be broken: SPARK-20534


On 04/28/2017 12:49 AM, Sean Owen wrote:
By the way the RC looks good. Sigs and license are OK, tests pass with -Phive -Pyarn -Phadoop-2.7. +1 from me.

On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.0-rc1 (8ccb4a57c82146c1a8f8966c7e64010cf5632cb6)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.1.




--

Herman van Hövell

Software Engineer

Databricks Inc.

[hidden email]

+31 6 420 590 27

databricks.com

http://databricks.com


Join Databricks at Spark Summit 2017 in San Francisco, the world's largest event for the Apache Spark community.

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Spark 2.2.0 (RC1)

Sean Owen
In reply to this post by Michael Armbrust
I have one more issue that, if it needs to be fixed, needs to be fixed for 2.2.0.

I'm fixing build warnings for the release and noticed that checkstyle actually complains there are some Java methods named in TitleCase, like `ProcessingTimeTimeout`:


Easy enough to fix and it's right, that's not conventional. However I wonder if it was done on purpose to match a class name?

I think this is one for @tdas

On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.0-rc1 (8ccb4a57c82146c1a8f8966c7e64010cf5632cb6)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.1.
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Spark 2.2.0 (RC1)

Michael Heuer
Version 2.2.0 bumps the dependency version for parquet to 1.8.2 but does not bump the dependency version for avro (currently at 1.7.7).  Though perhaps not clear from the issue I reported [0], this means that Spark is internally inconsistent, in that a call through parquet (which depends on avro 1.8.0 [1]) may throw errors at runtime when it hits avro 1.7.7 on the classpath.  Avro 1.8.0 is not binary compatible with 1.7.7.

[0] - https://issues.apache.org/jira/browse/SPARK-19697
[1] - https://github.com/apache/parquet-mr/blob/apache-parquet-1.8.2/pom.xml#L96

On Sun, Apr 30, 2017 at 3:28 AM, Sean Owen <[hidden email]> wrote:
I have one more issue that, if it needs to be fixed, needs to be fixed for 2.2.0.

I'm fixing build warnings for the release and noticed that checkstyle actually complains there are some Java methods named in TitleCase, like `ProcessingTimeTimeout`:


Easy enough to fix and it's right, that's not conventional. However I wonder if it was done on purpose to match a class name?

I think this is one for @tdas

On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.0-rc1 (8ccb4a57c82146c1a8f8966c7e64010cf5632cb6)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.1.

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Spark 2.2.0 (RC1)

Sean Owen
See discussion at https://github.com/apache/spark/pull/17163 -- I think the issue is that fixing this trades one problem for a slightly bigger one.

On Mon, May 1, 2017 at 4:13 PM Michael Heuer <[hidden email]> wrote:
Version 2.2.0 bumps the dependency version for parquet to 1.8.2 but does not bump the dependency version for avro (currently at 1.7.7).  Though perhaps not clear from the issue I reported [0], this means that Spark is internally inconsistent, in that a call through parquet (which depends on avro 1.8.0 [1]) may throw errors at runtime when it hits avro 1.7.7 on the classpath.  Avro 1.8.0 is not binary compatible with 1.7.7.

[0] - https://issues.apache.org/jira/browse/SPARK-19697
[1] - https://github.com/apache/parquet-mr/blob/apache-parquet-1.8.2/pom.xml#L96

On Sun, Apr 30, 2017 at 3:28 AM, Sean Owen <[hidden email]> wrote:
I have one more issue that, if it needs to be fixed, needs to be fixed for 2.2.0.

I'm fixing build warnings for the release and noticed that checkstyle actually complains there are some Java methods named in TitleCase, like `ProcessingTimeTimeout`:


Easy enough to fix and it's right, that's not conventional. However I wonder if it was done on purpose to match a class name?

I think this is one for @tdas

On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.0-rc1 (8ccb4a57c82146c1a8f8966c7e64010cf5632cb6)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.1.

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Spark 2.2.0 (RC1)

Ryan Blue
I agree with Sean. Spark only pulls in parquet-avro for tests. For execution, it implements the record materialization APIs in Parquet to go directly to Spark SQL rows. This doesn't actually leak an Avro 1.8 dependency into Spark as far as I can tell.

rb

On Mon, May 1, 2017 at 8:34 AM, Sean Owen <[hidden email]> wrote:
See discussion at https://github.com/apache/spark/pull/17163 -- I think the issue is that fixing this trades one problem for a slightly bigger one.


On Mon, May 1, 2017 at 4:13 PM Michael Heuer <[hidden email]> wrote:
Version 2.2.0 bumps the dependency version for parquet to 1.8.2 but does not bump the dependency version for avro (currently at 1.7.7).  Though perhaps not clear from the issue I reported [0], this means that Spark is internally inconsistent, in that a call through parquet (which depends on avro 1.8.0 [1]) may throw errors at runtime when it hits avro 1.7.7 on the classpath.  Avro 1.8.0 is not binary compatible with 1.7.7.

[0] - https://issues.apache.org/jira/browse/SPARK-19697
[1] - https://github.com/apache/parquet-mr/blob/apache-parquet-1.8.2/pom.xml#L96

On Sun, Apr 30, 2017 at 3:28 AM, Sean Owen <[hidden email]> wrote:
I have one more issue that, if it needs to be fixed, needs to be fixed for 2.2.0.

I'm fixing build warnings for the release and noticed that checkstyle actually complains there are some Java methods named in TitleCase, like `ProcessingTimeTimeout`:


Easy enough to fix and it's right, that's not conventional. However I wonder if it was done on purpose to match a class name?

I think this is one for @tdas

On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.0-rc1 (8ccb4a57c82146c1a8f8966c7e64010cf5632cb6)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.1.




--
Ryan Blue
Software Engineer
Netflix
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Spark 2.2.0 (RC1)

Frank Austin Nothaft
Hi Ryan et al,

The issue we’ve seen using a build of the Spark 2.2.0 branch from a downstream project is that parquet-avro uses one of the new Avro 1.8.0 methods, and you get a NoSuchMethodError since Spark puts Avro 1.7.7 as a dependency. My colleague Michael (who posted earlier on this thread) documented this in Spark-19697. I know that Spark has unit tests that check this compatibility issue, but it looks like there was a recent change that sets a test scope dependency on Avro 1.8.0, which masks this issue in the unit tests. With this error, you can’t use the ParquetAvroOutputFormat from a application running on Spark 2.2.0.

Regards,

Frank Austin Nothaft
202-340-0466

On May 1, 2017, at 10:02 AM, Ryan Blue <[hidden email]> wrote:

I agree with Sean. Spark only pulls in parquet-avro for tests. For execution, it implements the record materialization APIs in Parquet to go directly to Spark SQL rows. This doesn't actually leak an Avro 1.8 dependency into Spark as far as I can tell.

rb

On Mon, May 1, 2017 at 8:34 AM, Sean Owen <[hidden email]> wrote:
See discussion at https://github.com/apache/spark/pull/17163 -- I think the issue is that fixing this trades one problem for a slightly bigger one.


On Mon, May 1, 2017 at 4:13 PM Michael Heuer <[hidden email]> wrote:
Version 2.2.0 bumps the dependency version for parquet to 1.8.2 but does not bump the dependency version for avro (currently at 1.7.7).  Though perhaps not clear from the issue I reported [0], this means that Spark is internally inconsistent, in that a call through parquet (which depends on avro 1.8.0 [1]) may throw errors at runtime when it hits avro 1.7.7 on the classpath.  Avro 1.8.0 is not binary compatible with 1.7.7.

[0] - https://issues.apache.org/jira/browse/SPARK-19697
[1] - https://github.com/apache/parquet-mr/blob/apache-parquet-1.8.2/pom.xml#L96

On Sun, Apr 30, 2017 at 3:28 AM, Sean Owen <[hidden email]> wrote:
I have one more issue that, if it needs to be fixed, needs to be fixed for 2.2.0.

I'm fixing build warnings for the release and noticed that checkstyle actually complains there are some Java methods named in TitleCase, like `ProcessingTimeTimeout`:


Easy enough to fix and it's right, that's not conventional. However I wonder if it was done on purpose to match a class name?

I think this is one for @tdas

On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.0-rc1 (8ccb4a57c82146c1a8f8966c7e64010cf5632cb6)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.1.




--
Ryan Blue
Software Engineer
Netflix

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Spark 2.2.0 (RC1)

Ryan Blue
Frank,

The issue you're running into is caused by using parquet-avro with Avro 1.7. Can't your downstream project set the Avro dependency to 1.8? Spark can't update Avro because it is a breaking change that would force users to rebuilt specific Avro classes in some cases. But you should be free to use Avro 1.8 to avoid the problem.

On Mon, May 1, 2017 at 11:08 AM, Frank Austin Nothaft <[hidden email]> wrote:
Hi Ryan et al,

The issue we’ve seen using a build of the Spark 2.2.0 branch from a downstream project is that parquet-avro uses one of the new Avro 1.8.0 methods, and you get a NoSuchMethodError since Spark puts Avro 1.7.7 as a dependency. My colleague Michael (who posted earlier on this thread) documented this in Spark-19697. I know that Spark has unit tests that check this compatibility issue, but it looks like there was a recent change that sets a test scope dependency on Avro 1.8.0, which masks this issue in the unit tests. With this error, you can’t use the ParquetAvroOutputFormat from a application running on Spark 2.2.0.

Regards,

Frank Austin Nothaft
<a href="tel:(202)%20340-0466" value="+12023400466" target="_blank">202-340-0466

On May 1, 2017, at 10:02 AM, Ryan Blue <[hidden email]> wrote:

I agree with Sean. Spark only pulls in parquet-avro for tests. For execution, it implements the record materialization APIs in Parquet to go directly to Spark SQL rows. This doesn't actually leak an Avro 1.8 dependency into Spark as far as I can tell.

rb

On Mon, May 1, 2017 at 8:34 AM, Sean Owen <[hidden email]> wrote:
See discussion at https://github.com/apache/spark/pull/17163 -- I think the issue is that fixing this trades one problem for a slightly bigger one.


On Mon, May 1, 2017 at 4:13 PM Michael Heuer <[hidden email]> wrote:
Version 2.2.0 bumps the dependency version for parquet to 1.8.2 but does not bump the dependency version for avro (currently at 1.7.7).  Though perhaps not clear from the issue I reported [0], this means that Spark is internally inconsistent, in that a call through parquet (which depends on avro 1.8.0 [1]) may throw errors at runtime when it hits avro 1.7.7 on the classpath.  Avro 1.8.0 is not binary compatible with 1.7.7.

[0] - https://issues.apache.org/jira/browse/SPARK-19697
[1] - https://github.com/apache/parquet-mr/blob/apache-parquet-1.8.2/pom.xml#L96

On Sun, Apr 30, 2017 at 3:28 AM, Sean Owen <[hidden email]> wrote:
I have one more issue that, if it needs to be fixed, needs to be fixed for 2.2.0.

I'm fixing build warnings for the release and noticed that checkstyle actually complains there are some Java methods named in TitleCase, like `ProcessingTimeTimeout`:


Easy enough to fix and it's right, that's not conventional. However I wonder if it was done on purpose to match a class name?

I think this is one for @tdas

On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.0-rc1 (8ccb4a57c82146c1a8f8966c7e64010cf5632cb6)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:

Release artifacts are signed with the following key:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.1.




--
Ryan Blue
Software Engineer
Netflix




--
Ryan Blue
Software Engineer
Netflix
12