[VOTE] Release Spark 2.4.6 (RC8)

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[VOTE] Release Spark 2.4.6 (RC8)

Holden Karau
Please vote on releasing the following candidate as Apache Spark version 2.4.6.

The vote is open until June 5th at 9AM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.6
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

There are currently no issues targeting 2.4.6 (try project = SPARK AND "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))

The tag to be voted on is v2.4.6-rc8 (commit 807e0a484d1de767d1f02bd8a622da6450bdf940):
https://github.com/apache/spark/tree/v2.4.6-rc8

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1349/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/

The list of bug fixes going into 2.4.6 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12346781

This release is using the release script of the tag v2.4.6-rc8.

FAQ

=========================
What happened to the other RCs?
=========================

The parallel maven build caused some flakiness so I wasn't comfortable releasing them. I backported the fix from the 3.0 branch for this release. I've got a proposed change to the build script so that we only push tags when once the build is a success for the future, but it does not block this release.

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with an out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.4.6?
===========================================

The current list of open tickets targeted at 2.4.6 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.6

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Spark 2.4.6 (RC8)

cloud0fan
+1 (binding), although I don't know why we jump from RC 3 to RC 8...

On Mon, Jun 1, 2020 at 7:47 AM Holden Karau <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.4.6.

The vote is open until June 5th at 9AM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.6
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

There are currently no issues targeting 2.4.6 (try project = SPARK AND "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))

The tag to be voted on is v2.4.6-rc8 (commit 807e0a484d1de767d1f02bd8a622da6450bdf940):
https://github.com/apache/spark/tree/v2.4.6-rc8

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1349/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/

The list of bug fixes going into 2.4.6 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12346781

This release is using the release script of the tag v2.4.6-rc8.

FAQ

=========================
What happened to the other RCs?
=========================

The parallel maven build caused some flakiness so I wasn't comfortable releasing them. I backported the fix from the 3.0 branch for this release. I've got a proposed change to the build script so that we only push tags when once the build is a success for the future, but it does not block this release.

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with an out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.4.6?
===========================================

The current list of open tickets targeted at 2.4.6 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.6

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Spark 2.4.6 (RC8)

Sean Owen-2
I suspect there were some problems with the release script to fix.

+1 from me, same as last time. This still appears to be OK in licenses and sigs, and source compiles and passes tests.

On Sun, May 31, 2020 at 11:23 PM Wenchen Fan <[hidden email]> wrote:
+1 (binding), although I don't know why we jump from RC 3 to RC 8...

On Mon, Jun 1, 2020 at 7:47 AM Holden Karau <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.4.6.

The vote is open until June 5th at 9AM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.6
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

There are currently no issues targeting 2.4.6 (try project = SPARK AND "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))

The tag to be voted on is v2.4.6-rc8 (commit 807e0a484d1de767d1f02bd8a622da6450bdf940):
https://github.com/apache/spark/tree/v2.4.6-rc8

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1349/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/

The list of bug fixes going into 2.4.6 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12346781

This release is using the release script of the tag v2.4.6-rc8.

FAQ

=========================
What happened to the other RCs?
=========================

The parallel maven build caused some flakiness so I wasn't comfortable releasing them. I backported the fix from the 3.0 branch for this release. I've got a proposed change to the build script so that we only push tags when once the build is a success for the future, but it does not block this release.

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with an out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.4.6?
===========================================

The current list of open tickets targeted at 2.4.6 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.6

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Spark 2.4.6 (RC8)

DB Tsai-3
In reply to this post by cloud0fan
+1 (binding), thanks!

On Sun, May 31, 2020 at 9:23 PM Wenchen Fan <[hidden email]> wrote:
+1 (binding), although I don't know why we jump from RC 3 to RC 8...

On Mon, Jun 1, 2020 at 7:47 AM Holden Karau <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.4.6.

The vote is open until June 5th at 9AM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.6
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

There are currently no issues targeting 2.4.6 (try project = SPARK AND "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))

The tag to be voted on is v2.4.6-rc8 (commit 807e0a484d1de767d1f02bd8a622da6450bdf940):
https://github.com/apache/spark/tree/v2.4.6-rc8

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1349/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/

The list of bug fixes going into 2.4.6 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12346781

This release is using the release script of the tag v2.4.6-rc8.

FAQ

=========================
What happened to the other RCs?
=========================

The parallel maven build caused some flakiness so I wasn't comfortable releasing them. I backported the fix from the 3.0 branch for this release. I've got a proposed change to the build script so that we only push tags when once the build is a success for the future, but it does not block this release.

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with an out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.4.6?
===========================================

The current list of open tickets targeted at 2.4.6 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.6

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
--
- DB Sent from my iPhone
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Spark 2.4.6 (RC8)

Holden Karau
In reply to this post by Sean Owen-2
Yes thats correct, the release script needs a bit of work and it's diverged a bit from 3.0 as well. I'll follow up with some more PRs in addition to the current one I have.

On Sun, May 31, 2020 at 10:08 PM Sean Owen <[hidden email]> wrote:
I suspect there were some problems with the release script to fix.

+1 from me, same as last time. This still appears to be OK in licenses and sigs, and source compiles and passes tests.

On Sun, May 31, 2020 at 11:23 PM Wenchen Fan <[hidden email]> wrote:
+1 (binding), although I don't know why we jump from RC 3 to RC 8...

On Mon, Jun 1, 2020 at 7:47 AM Holden Karau <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.4.6.

The vote is open until June 5th at 9AM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.6
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

There are currently no issues targeting 2.4.6 (try project = SPARK AND "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))

The tag to be voted on is v2.4.6-rc8 (commit 807e0a484d1de767d1f02bd8a622da6450bdf940):
https://github.com/apache/spark/tree/v2.4.6-rc8

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1349/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/

The list of bug fixes going into 2.4.6 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12346781

This release is using the release script of the tag v2.4.6-rc8.

FAQ

=========================
What happened to the other RCs?
=========================

The parallel maven build caused some flakiness so I wasn't comfortable releasing them. I backported the fix from the 3.0 branch for this release. I've got a proposed change to the build script so that we only push tags when once the build is a success for the future, but it does not block this release.

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with an out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.4.6?
===========================================

The current list of open tickets targeted at 2.4.6 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.6

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Spark 2.4.6 (RC8)

Prashant Sharma
+1

Thanks!

On Mon, Jun 1, 2020 at 10:50 AM Holden Karau <[hidden email]> wrote:
Yes thats correct, the release script needs a bit of work and it's diverged a bit from 3.0 as well. I'll follow up with some more PRs in addition to the current one I have.

On Sun, May 31, 2020 at 10:08 PM Sean Owen <[hidden email]> wrote:
I suspect there were some problems with the release script to fix.

+1 from me, same as last time. This still appears to be OK in licenses and sigs, and source compiles and passes tests.

On Sun, May 31, 2020 at 11:23 PM Wenchen Fan <[hidden email]> wrote:
+1 (binding), although I don't know why we jump from RC 3 to RC 8...

On Mon, Jun 1, 2020 at 7:47 AM Holden Karau <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.4.6.

The vote is open until June 5th at 9AM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.6
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

There are currently no issues targeting 2.4.6 (try project = SPARK AND "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))

The tag to be voted on is v2.4.6-rc8 (commit 807e0a484d1de767d1f02bd8a622da6450bdf940):
https://github.com/apache/spark/tree/v2.4.6-rc8

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1349/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/

The list of bug fixes going into 2.4.6 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12346781

This release is using the release script of the tag v2.4.6-rc8.

FAQ

=========================
What happened to the other RCs?
=========================

The parallel maven build caused some flakiness so I wasn't comfortable releasing them. I backported the fix from the 3.0 branch for this release. I've got a proposed change to the build script so that we only push tags when once the build is a success for the future, but it does not block this release.

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with an out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.4.6?
===========================================

The current list of open tickets targeted at 2.4.6 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.6

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Spark 2.4.6 (RC8)

Mridul Muralidharan
In reply to this post by Holden Karau

+1 (binding)


Thanks,
Mridul


On Sun, May 31, 2020 at 4:47 PM Holden Karau <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.4.6.

The vote is open until June 5th at 9AM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.6
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

There are currently no issues targeting 2.4.6 (try project = SPARK AND "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))

The tag to be voted on is v2.4.6-rc8 (commit 807e0a484d1de767d1f02bd8a622da6450bdf940):
https://github.com/apache/spark/tree/v2.4.6-rc8

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1349/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/

The list of bug fixes going into 2.4.6 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12346781

This release is using the release script of the tag v2.4.6-rc8.

FAQ

=========================
What happened to the other RCs?
=========================

The parallel maven build caused some flakiness so I wasn't comfortable releasing them. I backported the fix from the 3.0 branch for this release. I've got a proposed change to the build script so that we only push tags when once the build is a success for the future, but it does not block this release.

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with an out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.4.6?
===========================================

The current list of open tickets targeted at 2.4.6 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.6

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Spark 2.4.6 (RC8)

Tom Graves-2
In reply to this post by Holden Karau
 +1

Tom

On Sunday, May 31, 2020, 06:47:09 PM CDT, Holden Karau <[hidden email]> wrote:


Please vote on releasing the following candidate as Apache Spark version 2.4.6.

The vote is open until June 5th at 9AM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.6
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

There are currently no issues targeting 2.4.6 (try project = SPARK AND "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))

The tag to be voted on is v2.4.6-rc8 (commit 807e0a484d1de767d1f02bd8a622da6450bdf940):
https://github.com/apache/spark/tree/v2.4.6-rc8

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1349/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/

The list of bug fixes going into 2.4.6 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12346781

This release is using the release script of the tag v2.4.6-rc8.

FAQ

=========================
What happened to the other RCs?
=========================

The parallel maven build caused some flakiness so I wasn't comfortable releasing them. I backported the fix from the 3.0 branch for this release. I've got a proposed change to the build script so that we only push tags when once the build is a success for the future, but it does not block this release.

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with an out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.4.6?
===========================================

The current list of open tickets targeted at 2.4.6 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.6

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Spark 2.4.6 (RC8)

Dongjoon Hyun-2
+1

Bests,
Dongjoon

On Wed, Jun 3, 2020 at 5:59 AM Tom Graves <[hidden email]> wrote:
 +1

Tom

On Sunday, May 31, 2020, 06:47:09 PM CDT, Holden Karau <[hidden email]> wrote:


Please vote on releasing the following candidate as Apache Spark version 2.4.6.

The vote is open until June 5th at 9AM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.6
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

There are currently no issues targeting 2.4.6 (try project = SPARK AND "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))

The tag to be voted on is v2.4.6-rc8 (commit 807e0a484d1de767d1f02bd8a622da6450bdf940):
https://github.com/apache/spark/tree/v2.4.6-rc8

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1349/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/

The list of bug fixes going into 2.4.6 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12346781

This release is using the release script of the tag v2.4.6-rc8.

FAQ

=========================
What happened to the other RCs?
=========================

The parallel maven build caused some flakiness so I wasn't comfortable releasing them. I backported the fix from the 3.0 branch for this release. I've got a proposed change to the build script so that we only push tags when once the build is a success for the future, but it does not block this release.

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with an out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.4.6?
===========================================

The current list of open tickets targeted at 2.4.6 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.6

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Spark 2.4.6 (RC8)

Xiao Li
Just downloaded it in my local macbook. Trying to create a table using the pre-built PySpark. It sounds like the conf "spark.sql.warehouse.dir" does not take an effect. It is trying to create a directory in "file:/user/hive/warehouse/t1". I have not done any investigation yet. Have any of you hit the same issue?

C02XT0U7JGH5:bin lixiao$ ./pyspark --conf spark.sql.warehouse.dir="/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6"

Python 2.7.16 (default, Jan 27 2020, 04:46:15) 

[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)] on darwin

Type "help", "copyright", "credits" or "license" for more information.

20/06/03 09:56:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /__ / .__/\_,_/_/ /_/\_\   version 2.4.6

      /_/


Using Python version 2.7.16 (default, Jan 27 2020 04:46:15)

SparkSession available as 'spark'.

>>> spark.sql("set spark.sql.warehouse.dir").show(truncate=False)

+-----------------------+-------------------------------------------------+

|key                    |value                                            |

+-----------------------+-------------------------------------------------+

|spark.sql.warehouse.dir|/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6|

+-----------------------+-------------------------------------------------+


>>> spark.sql("create table t1 (col1 int)")

20/06/03 09:56:29 WARN HiveMetaStore: Location: file:/user/hive/warehouse/t1 specified for non-external table:t1

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

  File "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/session.py", line 767, in sql

    return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)

  File "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__

  File "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/utils.py", line 69, in deco

    raise AnalysisException(s.split(': ', 1)[1], stackTrace)

pyspark.sql.utils.AnalysisException: u'org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:file:/user/hive/warehouse/t1 is not a directory or unable to create one);'


Dongjoon Hyun <[hidden email]> 于2020年6月3日周三 上午9:18写道:
+1

Bests,
Dongjoon

On Wed, Jun 3, 2020 at 5:59 AM Tom Graves <[hidden email]> wrote:
 +1

Tom

On Sunday, May 31, 2020, 06:47:09 PM CDT, Holden Karau <[hidden email]> wrote:


Please vote on releasing the following candidate as Apache Spark version 2.4.6.

The vote is open until June 5th at 9AM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.6
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

There are currently no issues targeting 2.4.6 (try project = SPARK AND "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))

The tag to be voted on is v2.4.6-rc8 (commit 807e0a484d1de767d1f02bd8a622da6450bdf940):
https://github.com/apache/spark/tree/v2.4.6-rc8

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1349/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/

The list of bug fixes going into 2.4.6 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12346781

This release is using the release script of the tag v2.4.6-rc8.

FAQ

=========================
What happened to the other RCs?
=========================

The parallel maven build caused some flakiness so I wasn't comfortable releasing them. I backported the fix from the 3.0 branch for this release. I've got a proposed change to the build script so that we only push tags when once the build is a success for the future, but it does not block this release.

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with an out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.4.6?
===========================================

The current list of open tickets targeted at 2.4.6 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.6

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Spark 2.4.6 (RC8)

Nicholas Chammas
I believe that was fixed in 3.0 and there was a decision not to backport the fix: SPARK-31170

On Wed, Jun 3, 2020 at 1:04 PM Xiao Li <[hidden email]> wrote:
Just downloaded it in my local macbook. Trying to create a table using the pre-built PySpark. It sounds like the conf "spark.sql.warehouse.dir" does not take an effect. It is trying to create a directory in "file:/user/hive/warehouse/t1". I have not done any investigation yet. Have any of you hit the same issue?

C02XT0U7JGH5:bin lixiao$ ./pyspark --conf spark.sql.warehouse.dir="/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6"

Python 2.7.16 (default, Jan 27 2020, 04:46:15) 

[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)] on darwin

Type "help", "copyright", "credits" or "license" for more information.

20/06/03 09:56:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /__ / .__/\_,_/_/ /_/\_\   version 2.4.6

      /_/


Using Python version 2.7.16 (default, Jan 27 2020 04:46:15)

SparkSession available as 'spark'.

>>> spark.sql("set spark.sql.warehouse.dir").show(truncate=False)

+-----------------------+-------------------------------------------------+

|key                    |value                                            |

+-----------------------+-------------------------------------------------+

|spark.sql.warehouse.dir|/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6|

+-----------------------+-------------------------------------------------+


>>> spark.sql("create table t1 (col1 int)")

20/06/03 09:56:29 WARN HiveMetaStore: Location: file:/user/hive/warehouse/t1 specified for non-external table:t1

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

  File "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/session.py", line 767, in sql

    return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)

  File "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__

  File "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/utils.py", line 69, in deco

    raise AnalysisException(s.split(': ', 1)[1], stackTrace)

pyspark.sql.utils.AnalysisException: u'org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:file:/user/hive/warehouse/t1 is not a directory or unable to create one);'


Dongjoon Hyun <[hidden email]> 于2020年6月3日周三 上午9:18写道:
+1

Bests,
Dongjoon

On Wed, Jun 3, 2020 at 5:59 AM Tom Graves <[hidden email]> wrote:
 +1

Tom

On Sunday, May 31, 2020, 06:47:09 PM CDT, Holden Karau <[hidden email]> wrote:


Please vote on releasing the following candidate as Apache Spark version 2.4.6.

The vote is open until June 5th at 9AM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.6
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

There are currently no issues targeting 2.4.6 (try project = SPARK AND "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))

The tag to be voted on is v2.4.6-rc8 (commit 807e0a484d1de767d1f02bd8a622da6450bdf940):
https://github.com/apache/spark/tree/v2.4.6-rc8

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1349/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/

The list of bug fixes going into 2.4.6 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12346781

This release is using the release script of the tag v2.4.6-rc8.

FAQ

=========================
What happened to the other RCs?
=========================

The parallel maven build caused some flakiness so I wasn't comfortable releasing them. I backported the fix from the 3.0 branch for this release. I've got a proposed change to the build script so that we only push tags when once the build is a success for the future, but it does not block this release.

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with an out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.4.6?
===========================================

The current list of open tickets targeted at 2.4.6 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.6

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Spark 2.4.6 (RC8)

Xiao Li
Yes. Spark 3.0 RC2 works well.

I think the current behavior in Spark 2.4 affects the adoption, especially for the new users who want to try Spark in their local environment. 

It impacts all our built-in clients, like Scala Shell and PySpark. Should we consider back-porting it to 2.4? 

Although this fixes the bug, it will also introduce the behavior change. We should publicly document it and mention it in the release note. Let us review it more carefully and understand the risk and impact. 

Thanks,

Xiao

Nicholas Chammas <[hidden email]> 于2020年6月3日周三 上午10:12写道:
I believe that was fixed in 3.0 and there was a decision not to backport the fix: SPARK-31170

On Wed, Jun 3, 2020 at 1:04 PM Xiao Li <[hidden email]> wrote:
Just downloaded it in my local macbook. Trying to create a table using the pre-built PySpark. It sounds like the conf "spark.sql.warehouse.dir" does not take an effect. It is trying to create a directory in "file:/user/hive/warehouse/t1". I have not done any investigation yet. Have any of you hit the same issue?

C02XT0U7JGH5:bin lixiao$ ./pyspark --conf spark.sql.warehouse.dir="/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6"

Python 2.7.16 (default, Jan 27 2020, 04:46:15) 

[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)] on darwin

Type "help", "copyright", "credits" or "license" for more information.

20/06/03 09:56:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /__ / .__/\_,_/_/ /_/\_\   version 2.4.6

      /_/


Using Python version 2.7.16 (default, Jan 27 2020 04:46:15)

SparkSession available as 'spark'.

>>> spark.sql("set spark.sql.warehouse.dir").show(truncate=False)

+-----------------------+-------------------------------------------------+

|key                    |value                                            |

+-----------------------+-------------------------------------------------+

|spark.sql.warehouse.dir|/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6|

+-----------------------+-------------------------------------------------+


>>> spark.sql("create table t1 (col1 int)")

20/06/03 09:56:29 WARN HiveMetaStore: Location: file:/user/hive/warehouse/t1 specified for non-external table:t1

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

  File "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/session.py", line 767, in sql

    return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)

  File "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__

  File "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/utils.py", line 69, in deco

    raise AnalysisException(s.split(': ', 1)[1], stackTrace)

pyspark.sql.utils.AnalysisException: u'org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:file:/user/hive/warehouse/t1 is not a directory or unable to create one);'


Dongjoon Hyun <[hidden email]> 于2020年6月3日周三 上午9:18写道:
+1

Bests,
Dongjoon

On Wed, Jun 3, 2020 at 5:59 AM Tom Graves <[hidden email]> wrote:
 +1

Tom

On Sunday, May 31, 2020, 06:47:09 PM CDT, Holden Karau <[hidden email]> wrote:


Please vote on releasing the following candidate as Apache Spark version 2.4.6.

The vote is open until June 5th at 9AM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.6
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

There are currently no issues targeting 2.4.6 (try project = SPARK AND "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))

The tag to be voted on is v2.4.6-rc8 (commit 807e0a484d1de767d1f02bd8a622da6450bdf940):
https://github.com/apache/spark/tree/v2.4.6-rc8

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1349/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/

The list of bug fixes going into 2.4.6 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12346781

This release is using the release script of the tag v2.4.6-rc8.

FAQ

=========================
What happened to the other RCs?
=========================

The parallel maven build caused some flakiness so I wasn't comfortable releasing them. I backported the fix from the 3.0 branch for this release. I've got a proposed change to the build script so that we only push tags when once the build is a success for the future, but it does not block this release.

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with an out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.4.6?
===========================================

The current list of open tickets targeted at 2.4.6 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.6

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Spark 2.4.6 (RC8)

Mridul Muralidharan

  Is this a behavior change in 2.4.x from earlier version ?
Or are we proposing to introduce  a functionality to help with adoption ?

Regards,
Mridul


On Wed, Jun 3, 2020 at 10:32 AM Xiao Li <[hidden email]> wrote:
Yes. Spark 3.0 RC2 works well.

I think the current behavior in Spark 2.4 affects the adoption, especially for the new users who want to try Spark in their local environment. 

It impacts all our built-in clients, like Scala Shell and PySpark. Should we consider back-porting it to 2.4? 

Although this fixes the bug, it will also introduce the behavior change. We should publicly document it and mention it in the release note. Let us review it more carefully and understand the risk and impact. 

Thanks,

Xiao

Nicholas Chammas <[hidden email]> 于2020年6月3日周三 上午10:12写道:
I believe that was fixed in 3.0 and there was a decision not to backport the fix: SPARK-31170

On Wed, Jun 3, 2020 at 1:04 PM Xiao Li <[hidden email]> wrote:
Just downloaded it in my local macbook. Trying to create a table using the pre-built PySpark. It sounds like the conf "spark.sql.warehouse.dir" does not take an effect. It is trying to create a directory in "file:/user/hive/warehouse/t1". I have not done any investigation yet. Have any of you hit the same issue?

C02XT0U7JGH5:bin lixiao$ ./pyspark --conf spark.sql.warehouse.dir="/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6"

Python 2.7.16 (default, Jan 27 2020, 04:46:15) 

[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)] on darwin

Type "help", "copyright", "credits" or "license" for more information.

20/06/03 09:56:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /__ / .__/\_,_/_/ /_/\_\   version 2.4.6

      /_/


Using Python version 2.7.16 (default, Jan 27 2020 04:46:15)

SparkSession available as 'spark'.

>>> spark.sql("set spark.sql.warehouse.dir").show(truncate=False)

+-----------------------+-------------------------------------------------+

|key                    |value                                            |

+-----------------------+-------------------------------------------------+

|spark.sql.warehouse.dir|/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6|

+-----------------------+-------------------------------------------------+


>>> spark.sql("create table t1 (col1 int)")

20/06/03 09:56:29 WARN HiveMetaStore: Location: file:/user/hive/warehouse/t1 specified for non-external table:t1

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

  File "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/session.py", line 767, in sql

    return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)

  File "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__

  File "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/utils.py", line 69, in deco

    raise AnalysisException(s.split(': ', 1)[1], stackTrace)

pyspark.sql.utils.AnalysisException: u'org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:file:/user/hive/warehouse/t1 is not a directory or unable to create one);'


Dongjoon Hyun <[hidden email]> 于2020年6月3日周三 上午9:18写道:
+1

Bests,
Dongjoon

On Wed, Jun 3, 2020 at 5:59 AM Tom Graves <[hidden email]> wrote:
 +1

Tom

On Sunday, May 31, 2020, 06:47:09 PM CDT, Holden Karau <[hidden email]> wrote:


Please vote on releasing the following candidate as Apache Spark version 2.4.6.

The vote is open until June 5th at 9AM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.6
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

There are currently no issues targeting 2.4.6 (try project = SPARK AND "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))

The tag to be voted on is v2.4.6-rc8 (commit 807e0a484d1de767d1f02bd8a622da6450bdf940):
https://github.com/apache/spark/tree/v2.4.6-rc8

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1349/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/

The list of bug fixes going into 2.4.6 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12346781

This release is using the release script of the tag v2.4.6-rc8.

FAQ

=========================
What happened to the other RCs?
=========================

The parallel maven build caused some flakiness so I wasn't comfortable releasing them. I backported the fix from the 3.0 branch for this release. I've got a proposed change to the build script so that we only push tags when once the build is a success for the future, but it does not block this release.

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with an out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.4.6?
===========================================

The current list of open tickets targeted at 2.4.6 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.6

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Spark 2.4.6 (RC8)

Holden Karau
If this is something we expect to mostly impact new users I think we can push them towards Spark 3 instead of introducing a behaviour change in 2.4.6

On Wed, Jun 3, 2020 at 12:34 PM Mridul Muralidharan <[hidden email]> wrote:

  Is this a behavior change in 2.4.x from earlier version ?
Or are we proposing to introduce  a functionality to help with adoption ?

Regards,
Mridul


On Wed, Jun 3, 2020 at 10:32 AM Xiao Li <[hidden email]> wrote:
Yes. Spark 3.0 RC2 works well.

I think the current behavior in Spark 2.4 affects the adoption, especially for the new users who want to try Spark in their local environment. 

It impacts all our built-in clients, like Scala Shell and PySpark. Should we consider back-porting it to 2.4? 

Although this fixes the bug, it will also introduce the behavior change. We should publicly document it and mention it in the release note. Let us review it more carefully and understand the risk and impact. 

Thanks,

Xiao

Nicholas Chammas <[hidden email]> 于2020年6月3日周三 上午10:12写道:
I believe that was fixed in 3.0 and there was a decision not to backport the fix: SPARK-31170

On Wed, Jun 3, 2020 at 1:04 PM Xiao Li <[hidden email]> wrote:
Just downloaded it in my local macbook. Trying to create a table using the pre-built PySpark. It sounds like the conf "spark.sql.warehouse.dir" does not take an effect. It is trying to create a directory in "file:/user/hive/warehouse/t1". I have not done any investigation yet. Have any of you hit the same issue?

C02XT0U7JGH5:bin lixiao$ ./pyspark --conf spark.sql.warehouse.dir="/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6"

Python 2.7.16 (default, Jan 27 2020, 04:46:15) 

[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)] on darwin

Type "help", "copyright", "credits" or "license" for more information.

20/06/03 09:56:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /__ / .__/\_,_/_/ /_/\_\   version 2.4.6

      /_/


Using Python version 2.7.16 (default, Jan 27 2020 04:46:15)

SparkSession available as 'spark'.

>>> spark.sql("set spark.sql.warehouse.dir").show(truncate=False)

+-----------------------+-------------------------------------------------+

|key                    |value                                            |

+-----------------------+-------------------------------------------------+

|spark.sql.warehouse.dir|/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6|

+-----------------------+-------------------------------------------------+


>>> spark.sql("create table t1 (col1 int)")

20/06/03 09:56:29 WARN HiveMetaStore: Location: file:/user/hive/warehouse/t1 specified for non-external table:t1

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

  File "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/session.py", line 767, in sql

    return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)

  File "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__

  File "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/utils.py", line 69, in deco

    raise AnalysisException(s.split(': ', 1)[1], stackTrace)

pyspark.sql.utils.AnalysisException: u'org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:file:/user/hive/warehouse/t1 is not a directory or unable to create one);'


Dongjoon Hyun <[hidden email]> 于2020年6月3日周三 上午9:18写道:
+1

Bests,
Dongjoon

On Wed, Jun 3, 2020 at 5:59 AM Tom Graves <[hidden email]> wrote:
 +1

Tom

On Sunday, May 31, 2020, 06:47:09 PM CDT, Holden Karau <[hidden email]> wrote:


Please vote on releasing the following candidate as Apache Spark version 2.4.6.

The vote is open until June 5th at 9AM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.6
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

There are currently no issues targeting 2.4.6 (try project = SPARK AND "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))

The tag to be voted on is v2.4.6-rc8 (commit 807e0a484d1de767d1f02bd8a622da6450bdf940):
https://github.com/apache/spark/tree/v2.4.6-rc8

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1349/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/

The list of bug fixes going into 2.4.6 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12346781

This release is using the release script of the tag v2.4.6-rc8.

FAQ

=========================
What happened to the other RCs?
=========================

The parallel maven build caused some flakiness so I wasn't comfortable releasing them. I backported the fix from the 3.0 branch for this release. I've got a proposed change to the build script so that we only push tags when once the build is a success for the future, but it does not block this release.

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with an out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.4.6?
===========================================

The current list of open tickets targeted at 2.4.6 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.6

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Spark 2.4.6 (RC8)

Holden Karau
Binding +1 and the vote passes. I’ll upload the release today/this weekend.

On a personal note, I hope everyone is doing as well as possible with pandemic and police violence. I’ve been grappling with the implications of our work as a community.

+1s (* binding):
Wenchen Fan *
Sean Owen *
DB Tsai *
Prashant Sharma *
Mridul Muralidharan *
Tom Graves *
Dongjoon Hyun *

0s:
None

-1s:
None

Comment without vote:
Nicholas Chammas
Xiao Li

On Wed, Jun 3, 2020 at 12:49 PM Holden Karau <[hidden email]> wrote:
If this is something we expect to mostly impact new users I think we can push them towards Spark 3 instead of introducing a behaviour change in 2.4.6

On Wed, Jun 3, 2020 at 12:34 PM Mridul Muralidharan <[hidden email]> wrote:

  Is this a behavior change in 2.4.x from earlier version ?
Or are we proposing to introduce  a functionality to help with adoption ?

Regards,
Mridul


On Wed, Jun 3, 2020 at 10:32 AM Xiao Li <[hidden email]> wrote:
Yes. Spark 3.0 RC2 works well.

I think the current behavior in Spark 2.4 affects the adoption, especially for the new users who want to try Spark in their local environment. 

It impacts all our built-in clients, like Scala Shell and PySpark. Should we consider back-porting it to 2.4? 

Although this fixes the bug, it will also introduce the behavior change. We should publicly document it and mention it in the release note. Let us review it more carefully and understand the risk and impact. 

Thanks,

Xiao

Nicholas Chammas <[hidden email]> 于2020年6月3日周三 上午10:12写道:
I believe that was fixed in 3.0 and there was a decision not to backport the fix: SPARK-31170

On Wed, Jun 3, 2020 at 1:04 PM Xiao Li <[hidden email]> wrote:
Just downloaded it in my local macbook. Trying to create a table using the pre-built PySpark. It sounds like the conf "spark.sql.warehouse.dir" does not take an effect. It is trying to create a directory in "file:/user/hive/warehouse/t1". I have not done any investigation yet. Have any of you hit the same issue?

C02XT0U7JGH5:bin lixiao$ ./pyspark --conf spark.sql.warehouse.dir="/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6"

Python 2.7.16 (default, Jan 27 2020, 04:46:15) 

[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)] on darwin

Type "help", "copyright", "credits" or "license" for more information.

20/06/03 09:56:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /__ / .__/\_,_/_/ /_/\_\   version 2.4.6

      /_/


Using Python version 2.7.16 (default, Jan 27 2020 04:46:15)

SparkSession available as 'spark'.

>>> spark.sql("set spark.sql.warehouse.dir").show(truncate=False)

+-----------------------+-------------------------------------------------+

|key                    |value                                            |

+-----------------------+-------------------------------------------------+

|spark.sql.warehouse.dir|/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6|

+-----------------------+-------------------------------------------------+


>>> spark.sql("create table t1 (col1 int)")

20/06/03 09:56:29 WARN HiveMetaStore: Location: file:/user/hive/warehouse/t1 specified for non-external table:t1

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

  File "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/session.py", line 767, in sql

    return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)

  File "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__

  File "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/utils.py", line 69, in deco

    raise AnalysisException(s.split(': ', 1)[1], stackTrace)

pyspark.sql.utils.AnalysisException: u'org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:file:/user/hive/warehouse/t1 is not a directory or unable to create one);'


Dongjoon Hyun <[hidden email]> 于2020年6月3日周三 上午9:18写道:
+1

Bests,
Dongjoon

On Wed, Jun 3, 2020 at 5:59 AM Tom Graves <[hidden email]> wrote:
 +1

Tom

On Sunday, May 31, 2020, 06:47:09 PM CDT, Holden Karau <[hidden email]> wrote:


Please vote on releasing the following candidate as Apache Spark version 2.4.6.

The vote is open until June 5th at 9AM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.6
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

There are currently no issues targeting 2.4.6 (try project = SPARK AND "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))

The tag to be voted on is v2.4.6-rc8 (commit 807e0a484d1de767d1f02bd8a622da6450bdf940):
https://github.com/apache/spark/tree/v2.4.6-rc8

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1349/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/

The list of bug fixes going into 2.4.6 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12346781

This release is using the release script of the tag v2.4.6-rc8.

FAQ

=========================
What happened to the other RCs?
=========================

The parallel maven build caused some flakiness so I wasn't comfortable releasing them. I backported the fix from the 3.0 branch for this release. I've got a proposed change to the build script so that we only push tags when once the build is a success for the future, but it does not block this release.

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with an out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.4.6?
===========================================

The current list of open tickets targeted at 2.4.6 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.6

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9