[VOTE] Spark 2.3.1 (RC4)

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[VOTE] Spark 2.3.1 (RC4)

Marcelo Vanzin
Please vote on releasing the following candidate as Apache Spark version 2.3.1.

Given that I expect at least a few people to be busy with Spark Summit next
week, I'm taking the liberty of setting an extended voting period. The vote
will be open until Friday, June 8th, at 19:00 UTC (that's 12:00 PDT).

It passes with a majority of +1 votes, which must include at least 3 +1 votes
from the PMC.

[ ] +1 Release this package as Apache Spark 2.3.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
https://github.com/apache/spark/tree/v2.3.1-rc4

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1272/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/

The list of bug fixes going into 2.3.1 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12342432

FAQ

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.3.1?
===========================================

The current list of open tickets targeted at 2.3.1 can be found at:
https://s.apache.org/Q3Uo

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Spark 2.3.1 (RC4)

Marcelo Vanzin
Starting with my own +1 (binding).

On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin <[hidden email]> wrote:

> Please vote on releasing the following candidate as Apache Spark version 2.3.1.
>
> Given that I expect at least a few people to be busy with Spark Summit next
> week, I'm taking the liberty of setting an extended voting period. The vote
> will be open until Friday, June 8th, at 19:00 UTC (that's 12:00 PDT).
>
> It passes with a majority of +1 votes, which must include at least 3 +1 votes
> from the PMC.
>
> [ ] +1 Release this package as Apache Spark 2.3.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
> https://github.com/apache/spark/tree/v2.3.1-rc4
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1272/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/
>
> The list of bug fixes going into 2.3.1 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342432
>
> FAQ
>
> =========================
> How can I help test this release?
> =========================
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===========================================
> What should happen to JIRA tickets still targeting 2.3.1?
> ===========================================
>
> The current list of open tickets targeted at 2.3.1 can be found at:
> https://s.apache.org/Q3Uo
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==================
> But my bug isn't fixed?
> ==================
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> --
> Marcelo



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Spark 2.3.1 (RC4)

Nicholas Chammas

I was able to successfully launch a Spark cluster on EC2 at 2.3.1 RC4 using Flintrock. However, trying to load the hadoop-aws package gave me some errors.

$ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4

<snipped>

:: problems summary ::
:::: WARNINGS
                [NOT FOUND  ] com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
        ==== local-m2-cache: tried
          file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar
                [NOT FOUND  ] com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
        ==== local-m2-cache: tried
          file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
                [NOT FOUND  ] org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
        ==== local-m2-cache: tried
          file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
                [NOT FOUND  ] com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
        ==== local-m2-cache: tried
          file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar

I’d guess I’m probably using the wrong version of hadoop-aws, but I called make-distribution.sh with -Phadoop-2.8 so I’m not sure what else to try.

Any quick pointers?

Nick


On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin <[hidden email]> wrote:
Starting with my own +1 (binding).

On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin <[hidden email]> wrote:
> Please vote on releasing the following candidate as Apache Spark version 2.3.1.
>
> Given that I expect at least a few people to be busy with Spark Summit next
> week, I'm taking the liberty of setting an extended voting period. The vote
> will be open until Friday, June 8th, at 19:00 UTC (that's 12:00 PDT).
>
> It passes with a majority of +1 votes, which must include at least 3 +1 votes
> from the PMC.
>
> [ ] +1 Release this package as Apache Spark 2.3.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
> https://github.com/apache/spark/tree/v2.3.1-rc4
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1272/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/
>
> The list of bug fixes going into 2.3.1 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342432
>
> FAQ
>
> =========================
> How can I help test this release?
> =========================
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===========================================
> What should happen to JIRA tickets still targeting 2.3.1?
> ===========================================
>
> The current list of open tickets targeted at 2.3.1 can be found at:
> https://s.apache.org/Q3Uo
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==================
> But my bug isn't fixed?
> ==================
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> --
> Marcelo



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Spark 2.3.1 (RC4)

Mark Hamstra
There is no hadoop-2.8 profile. Use hadoop-2.7, which is effectively hadoop-2.7+

On Fri, Jun 1, 2018 at 4:01 PM Nicholas Chammas <[hidden email]> wrote:

I was able to successfully launch a Spark cluster on EC2 at 2.3.1 RC4 using Flintrock. However, trying to load the hadoop-aws package gave me some errors.

$ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4

<snipped>

:: problems summary ::
:::: WARNINGS
                [NOT FOUND  ] com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
        ==== local-m2-cache: tried
          file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar
                [NOT FOUND  ] com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
        ==== local-m2-cache: tried
          file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
                [NOT FOUND  ] org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
        ==== local-m2-cache: tried
          file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
                [NOT FOUND  ] com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
        ==== local-m2-cache: tried
          file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar

I’d guess I’m probably using the wrong version of hadoop-aws, but I called make-distribution.sh with -Phadoop-2.8 so I’m not sure what else to try.

Any quick pointers?

Nick


On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin <[hidden email]> wrote:
Starting with my own +1 (binding).

On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin <[hidden email]> wrote:
> Please vote on releasing the following candidate as Apache Spark version 2.3.1.
>
> Given that I expect at least a few people to be busy with Spark Summit next
> week, I'm taking the liberty of setting an extended voting period. The vote
> will be open until Friday, June 8th, at 19:00 UTC (that's 12:00 PDT).
>
> It passes with a majority of +1 votes, which must include at least 3 +1 votes
> from the PMC.
>
> [ ] +1 Release this package as Apache Spark 2.3.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
> https://github.com/apache/spark/tree/v2.3.1-rc4
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1272/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/
>
> The list of bug fixes going into 2.3.1 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342432
>
> FAQ
>
> =========================
> How can I help test this release?
> =========================
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===========================================
> What should happen to JIRA tickets still targeting 2.3.1?
> ===========================================
>
> The current list of open tickets targeted at 2.3.1 can be found at:
> https://s.apache.org/Q3Uo
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==================
> But my bug isn't fixed?
> ==================
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> --
> Marcelo



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Spark 2.3.1 (RC4)

Marcelo Vanzin
In reply to this post by Nicholas Chammas
Using the hadoop-aws package is probably going to be a little more
complicated than that. The best bet is to use a custom build of Spark
that includes it (use -Phadoop-cloud). Otherwise you're probably
looking at some nasty dependency issues, especially if you end up
mixing different versions of Hadoop.

On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
<[hidden email]> wrote:

> I was able to successfully launch a Spark cluster on EC2 at 2.3.1 RC4 using
> Flintrock. However, trying to load the hadoop-aws package gave me some
> errors.
>
> $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
>
> <snipped>
>
> :: problems summary ::
> :::: WARNINGS
>                 [NOT FOUND  ]
> com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
>         ==== local-m2-cache: tried
>
> file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar
>                 [NOT FOUND  ]
> com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
>         ==== local-m2-cache: tried
>
> file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
>                 [NOT FOUND  ]
> org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
>         ==== local-m2-cache: tried
>
> file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
>                 [NOT FOUND  ]
> com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
>         ==== local-m2-cache: tried
>
> file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
>
> I’d guess I’m probably using the wrong version of hadoop-aws, but I called
> make-distribution.sh with -Phadoop-2.8 so I’m not sure what else to try.
>
> Any quick pointers?
>
> Nick
>
>
> On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin <[hidden email]> wrote:
>>
>> Starting with my own +1 (binding).
>>
>> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin <[hidden email]>
>> wrote:
>> > Please vote on releasing the following candidate as Apache Spark version
>> > 2.3.1.
>> >
>> > Given that I expect at least a few people to be busy with Spark Summit
>> > next
>> > week, I'm taking the liberty of setting an extended voting period. The
>> > vote
>> > will be open until Friday, June 8th, at 19:00 UTC (that's 12:00 PDT).
>> >
>> > It passes with a majority of +1 votes, which must include at least 3 +1
>> > votes
>> > from the PMC.
>> >
>> > [ ] +1 Release this package as Apache Spark 2.3.1
>> > [ ] -1 Do not release this package because ...
>> >
>> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >
>> > The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
>> > https://github.com/apache/spark/tree/v2.3.1-rc4
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/
>> >
>> > Signatures used for Spark RCs can be found in this file:
>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1272/
>> >
>> > The documentation corresponding to this release can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/
>> >
>> > The list of bug fixes going into 2.3.1 can be found at the following
>> > URL:
>> > https://issues.apache.org/jira/projects/SPARK/versions/12342432
>> >
>> > FAQ
>> >
>> > =========================
>> > How can I help test this release?
>> > =========================
>> >
>> > If you are a Spark user, you can help us test this release by taking
>> > an existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > If you're working in PySpark you can set up a virtual env and install
>> > the current RC and see if anything important breaks, in the Java/Scala
>> > you can add the staging repository to your projects resolvers and test
>> > with the RC (make sure to clean up the artifact cache before/after so
>> > you don't end up building with a out of date RC going forward).
>> >
>> > ===========================================
>> > What should happen to JIRA tickets still targeting 2.3.1?
>> > ===========================================
>> >
>> > The current list of open tickets targeted at 2.3.1 can be found at:
>> > https://s.apache.org/Q3Uo
>> >
>> > Committers should look at those and triage. Extremely important bug
>> > fixes, documentation, and API tweaks that impact compatibility should
>> > be worked on immediately. Everything else please retarget to an
>> > appropriate release.
>> >
>> > ==================
>> > But my bug isn't fixed?
>> > ==================
>> >
>> > In order to make timely releases, we will typically not hold the
>> > release unless the bug in question is a regression from the previous
>> > release. That being said, if there is something which is a regression
>> > that has not been correctly targeted please ping me or a committer to
>> > help target the issue.
>> >
>> >
>> > --
>> > Marcelo
>>
>>
>>
>> --
>> Marcelo
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [hidden email]
>>
>



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Spark 2.3.1 (RC4)

Nicholas Chammas

Building with -Phadoop-2.7 didn’t help, and if I remember correctly, building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0 release, so it appears something has changed since then.

I wasn’t familiar with -Phadoop-cloud, but I can try that.

My goal here is simply to confirm that this release of Spark works with hadoop-aws like past releases did, particularly for Flintrock users who use Spark with S3A.

We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop builds with every Spark release. If the -hadoop2.7 release build won’t work with hadoop-aws anymore, are there plans to provide a new build type that will?

Apologies if the question is poorly formed. I’m batting a bit outside my league here. Again, my goal is simply to confirm that I/my users still have a way to use s3a://. In the past, that way was simply to call pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very similar. If that will no longer work, I’m trying to confirm that the change of behavior is intentional or acceptable (as a review for the Spark project) and figure out what I need to change (as due diligence for Flintrock’s users).

Nick


On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin <[hidden email]> wrote:
Using the hadoop-aws package is probably going to be a little more
complicated than that. The best bet is to use a custom build of Spark
that includes it (use -Phadoop-cloud). Otherwise you're probably
looking at some nasty dependency issues, especially if you end up
mixing different versions of Hadoop.

On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
<[hidden email]> wrote:
> I was able to successfully launch a Spark cluster on EC2 at 2.3.1 RC4 using
> Flintrock. However, trying to load the hadoop-aws package gave me some
> errors.
>
> $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
>
> <snipped>
>
> :: problems summary ::
> :::: WARNINGS
>                 [NOT FOUND  ]
> com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
>         ==== local-m2-cache: tried
>
> file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar
>                 [NOT FOUND  ]
> com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
>         ==== local-m2-cache: tried
>
> file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
>                 [NOT FOUND  ]
> org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
>         ==== local-m2-cache: tried
>
> file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
>                 [NOT FOUND  ]
> com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
>         ==== local-m2-cache: tried
>
> file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
>
> I’d guess I’m probably using the wrong version of hadoop-aws, but I called
> make-distribution.sh with -Phadoop-2.8 so I’m not sure what else to try.
>
> Any quick pointers?
>
> Nick
>
>
> On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin <[hidden email]> wrote:
>>
>> Starting with my own +1 (binding).
>>
>> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin <[hidden email]>
>> wrote:
>> > Please vote on releasing the following candidate as Apache Spark version
>> > 2.3.1.
>> >
>> > Given that I expect at least a few people to be busy with Spark Summit
>> > next
>> > week, I'm taking the liberty of setting an extended voting period. The
>> > vote
>> > will be open until Friday, June 8th, at 19:00 UTC (that's 12:00 PDT).
>> >
>> > It passes with a majority of +1 votes, which must include at least 3 +1
>> > votes
>> > from the PMC.
>> >
>> > [ ] +1 Release this package as Apache Spark 2.3.1
>> > [ ] -1 Do not release this package because ...
>> >
>> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >
>> > The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
>> > https://github.com/apache/spark/tree/v2.3.1-rc4
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/
>> >
>> > Signatures used for Spark RCs can be found in this file:
>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1272/
>> >
>> > The documentation corresponding to this release can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/
>> >
>> > The list of bug fixes going into 2.3.1 can be found at the following
>> > URL:
>> > https://issues.apache.org/jira/projects/SPARK/versions/12342432
>> >
>> > FAQ
>> >
>> > =========================
>> > How can I help test this release?
>> > =========================
>> >
>> > If you are a Spark user, you can help us test this release by taking
>> > an existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > If you're working in PySpark you can set up a virtual env and install
>> > the current RC and see if anything important breaks, in the Java/Scala
>> > you can add the staging repository to your projects resolvers and test
>> > with the RC (make sure to clean up the artifact cache before/after so
>> > you don't end up building with a out of date RC going forward).
>> >
>> > ===========================================
>> > What should happen to JIRA tickets still targeting 2.3.1?
>> > ===========================================
>> >
>> > The current list of open tickets targeted at 2.3.1 can be found at:
>> > https://s.apache.org/Q3Uo
>> >
>> > Committers should look at those and triage. Extremely important bug
>> > fixes, documentation, and API tweaks that impact compatibility should
>> > be worked on immediately. Everything else please retarget to an
>> > appropriate release.
>> >
>> > ==================
>> > But my bug isn't fixed?
>> > ==================
>> >
>> > In order to make timely releases, we will typically not hold the
>> > release unless the bug in question is a regression from the previous
>> > release. That being said, if there is something which is a regression
>> > that has not been correctly targeted please ping me or a committer to
>> > help target the issue.
>> >
>> >
>> > --
>> > Marcelo
>>
>>
>>
>> --
>> Marcelo
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [hidden email]
>>
>



--
Marcelo
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Spark 2.3.1 (RC4)

rxin
In reply to this post by Marcelo Vanzin

+1

On Fri, Jun 1, 2018 at 3:29 PM Marcelo Vanzin <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.3.1.

Given that I expect at least a few people to be busy with Spark Summit next
week, I'm taking the liberty of setting an extended voting period. The vote
will be open until Friday, June 8th, at 19:00 UTC (that's 12:00 PDT).

It passes with a majority of +1 votes, which must include at least 3 +1 votes
from the PMC.

[ ] +1 Release this package as Apache Spark 2.3.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
https://github.com/apache/spark/tree/v2.3.1-rc4

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1272/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/

The list of bug fixes going into 2.3.1 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12342432

FAQ

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.3.1?
===========================================

The current list of open tickets targeted at 2.3.1 can be found at:
https://s.apache.org/Q3Uo

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Spark 2.3.1 (RC4)

Marcelo Vanzin
In reply to this post by Nicholas Chammas
I have personally never tried to include hadoop-aws that way. But at
the very least, I'd try to use the same version of Hadoop as the Spark
build (2.7.3 IIRC). I don't really expect a different version to work,
and if it did in the past it definitely was not by design.

On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas
<[hidden email]> wrote:

> Building with -Phadoop-2.7 didn’t help, and if I remember correctly,
> building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0 release, so
> it appears something has changed since then.
>
> I wasn’t familiar with -Phadoop-cloud, but I can try that.
>
> My goal here is simply to confirm that this release of Spark works with
> hadoop-aws like past releases did, particularly for Flintrock users who use
> Spark with S3A.
>
> We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop builds with
> every Spark release. If the -hadoop2.7 release build won’t work with
> hadoop-aws anymore, are there plans to provide a new build type that will?
>
> Apologies if the question is poorly formed. I’m batting a bit outside my
> league here. Again, my goal is simply to confirm that I/my users still have
> a way to use s3a://. In the past, that way was simply to call pyspark
> --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very similar. If
> that will no longer work, I’m trying to confirm that the change of behavior
> is intentional or acceptable (as a review for the Spark project) and figure
> out what I need to change (as due diligence for Flintrock’s users).
>
> Nick
>
>
> On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin <[hidden email]> wrote:
>>
>> Using the hadoop-aws package is probably going to be a little more
>> complicated than that. The best bet is to use a custom build of Spark
>> that includes it (use -Phadoop-cloud). Otherwise you're probably
>> looking at some nasty dependency issues, especially if you end up
>> mixing different versions of Hadoop.
>>
>> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
>> <[hidden email]> wrote:
>> > I was able to successfully launch a Spark cluster on EC2 at 2.3.1 RC4
>> > using
>> > Flintrock. However, trying to load the hadoop-aws package gave me some
>> > errors.
>> >
>> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
>> >
>> > <snipped>
>> >
>> > :: problems summary ::
>> > :::: WARNINGS
>> >                 [NOT FOUND  ]
>> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
>> >         ==== local-m2-cache: tried
>> >
>> >
>> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar
>> >                 [NOT FOUND  ]
>> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
>> >         ==== local-m2-cache: tried
>> >
>> >
>> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
>> >                 [NOT FOUND  ]
>> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
>> >         ==== local-m2-cache: tried
>> >
>> >
>> > file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
>> >                 [NOT FOUND  ]
>> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
>> >         ==== local-m2-cache: tried
>> >
>> >
>> > file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
>> >
>> > I’d guess I’m probably using the wrong version of hadoop-aws, but I
>> > called
>> > make-distribution.sh with -Phadoop-2.8 so I’m not sure what else to try.
>> >
>> > Any quick pointers?
>> >
>> > Nick
>> >
>> >
>> > On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin <[hidden email]>
>> > wrote:
>> >>
>> >> Starting with my own +1 (binding).
>> >>
>> >> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin <[hidden email]>
>> >> wrote:
>> >> > Please vote on releasing the following candidate as Apache Spark
>> >> > version
>> >> > 2.3.1.
>> >> >
>> >> > Given that I expect at least a few people to be busy with Spark
>> >> > Summit
>> >> > next
>> >> > week, I'm taking the liberty of setting an extended voting period.
>> >> > The
>> >> > vote
>> >> > will be open until Friday, June 8th, at 19:00 UTC (that's 12:00 PDT).
>> >> >
>> >> > It passes with a majority of +1 votes, which must include at least 3
>> >> > +1
>> >> > votes
>> >> > from the PMC.
>> >> >
>> >> > [ ] +1 Release this package as Apache Spark 2.3.1
>> >> > [ ] -1 Do not release this package because ...
>> >> >
>> >> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >> >
>> >> > The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
>> >> > https://github.com/apache/spark/tree/v2.3.1-rc4
>> >> >
>> >> > The release files, including signatures, digests, etc. can be found
>> >> > at:
>> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/
>> >> >
>> >> > Signatures used for Spark RCs can be found in this file:
>> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >> >
>> >> > The staging repository for this release can be found at:
>> >> >
>> >> > https://repository.apache.org/content/repositories/orgapachespark-1272/
>> >> >
>> >> > The documentation corresponding to this release can be found at:
>> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/
>> >> >
>> >> > The list of bug fixes going into 2.3.1 can be found at the following
>> >> > URL:
>> >> > https://issues.apache.org/jira/projects/SPARK/versions/12342432
>> >> >
>> >> > FAQ
>> >> >
>> >> > =========================
>> >> > How can I help test this release?
>> >> > =========================
>> >> >
>> >> > If you are a Spark user, you can help us test this release by taking
>> >> > an existing Spark workload and running on this release candidate,
>> >> > then
>> >> > reporting any regressions.
>> >> >
>> >> > If you're working in PySpark you can set up a virtual env and install
>> >> > the current RC and see if anything important breaks, in the
>> >> > Java/Scala
>> >> > you can add the staging repository to your projects resolvers and
>> >> > test
>> >> > with the RC (make sure to clean up the artifact cache before/after so
>> >> > you don't end up building with a out of date RC going forward).
>> >> >
>> >> > ===========================================
>> >> > What should happen to JIRA tickets still targeting 2.3.1?
>> >> > ===========================================
>> >> >
>> >> > The current list of open tickets targeted at 2.3.1 can be found at:
>> >> > https://s.apache.org/Q3Uo
>> >> >
>> >> > Committers should look at those and triage. Extremely important bug
>> >> > fixes, documentation, and API tweaks that impact compatibility should
>> >> > be worked on immediately. Everything else please retarget to an
>> >> > appropriate release.
>> >> >
>> >> > ==================
>> >> > But my bug isn't fixed?
>> >> > ==================
>> >> >
>> >> > In order to make timely releases, we will typically not hold the
>> >> > release unless the bug in question is a regression from the previous
>> >> > release. That being said, if there is something which is a regression
>> >> > that has not been correctly targeted please ping me or a committer to
>> >> > help target the issue.
>> >> >
>> >> >
>> >> > --
>> >> > Marcelo
>> >>
>> >>
>> >>
>> >> --
>> >> Marcelo
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe e-mail: [hidden email]
>> >>
>> >
>>
>>
>>
>> --
>> Marcelo



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Spark 2.3.1 (RC4)

Nicholas Chammas

pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work for me either (even building with -Phadoop-2.7). I guess I’ve been relying on an unsupported pattern and will need to figure something else out going forward in order to use s3a://.


On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin <[hidden email]> wrote:
I have personally never tried to include hadoop-aws that way. But at
the very least, I'd try to use the same version of Hadoop as the Spark
build (2.7.3 IIRC). I don't really expect a different version to work,
and if it did in the past it definitely was not by design.

On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas
<[hidden email]> wrote:
> Building with -Phadoop-2.7 didn’t help, and if I remember correctly,
> building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0 release, so
> it appears something has changed since then.
>
> I wasn’t familiar with -Phadoop-cloud, but I can try that.
>
> My goal here is simply to confirm that this release of Spark works with
> hadoop-aws like past releases did, particularly for Flintrock users who use
> Spark with S3A.
>
> We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop builds with
> every Spark release. If the -hadoop2.7 release build won’t work with
> hadoop-aws anymore, are there plans to provide a new build type that will?
>
> Apologies if the question is poorly formed. I’m batting a bit outside my
> league here. Again, my goal is simply to confirm that I/my users still have
> a way to use s3a://. In the past, that way was simply to call pyspark
> --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very similar. If
> that will no longer work, I’m trying to confirm that the change of behavior
> is intentional or acceptable (as a review for the Spark project) and figure
> out what I need to change (as due diligence for Flintrock’s users).
>
> Nick
>
>
> On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin <[hidden email]> wrote:
>>
>> Using the hadoop-aws package is probably going to be a little more
>> complicated than that. The best bet is to use a custom build of Spark
>> that includes it (use -Phadoop-cloud). Otherwise you're probably
>> looking at some nasty dependency issues, especially if you end up
>> mixing different versions of Hadoop.
>>
>> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
>> <[hidden email]> wrote:
>> > I was able to successfully launch a Spark cluster on EC2 at 2.3.1 RC4
>> > using
>> > Flintrock. However, trying to load the hadoop-aws package gave me some
>> > errors.
>> >
>> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
>> >
>> > <snipped>
>> >
>> > :: problems summary ::
>> > :::: WARNINGS
>> >                 [NOT FOUND  ]
>> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
>> >         ==== local-m2-cache: tried
>> >
>> >
>> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar
>> >                 [NOT FOUND  ]
>> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
>> >         ==== local-m2-cache: tried
>> >
>> >
>> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
>> >                 [NOT FOUND  ]
>> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
>> >         ==== local-m2-cache: tried
>> >
>> >
>> > file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
>> >                 [NOT FOUND  ]
>> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
>> >         ==== local-m2-cache: tried
>> >
>> >
>> > file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
>> >
>> > I’d guess I’m probably using the wrong version of hadoop-aws, but I
>> > called
>> > make-distribution.sh with -Phadoop-2.8 so I’m not sure what else to try.
>> >
>> > Any quick pointers?
>> >
>> > Nick
>> >
>> >
>> > On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin <[hidden email]>
>> > wrote:
>> >>
>> >> Starting with my own +1 (binding).
>> >>
>> >> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin <[hidden email]>
>> >> wrote:
>> >> > Please vote on releasing the following candidate as Apache Spark
>> >> > version
>> >> > 2.3.1.
>> >> >
>> >> > Given that I expect at least a few people to be busy with Spark
>> >> > Summit
>> >> > next
>> >> > week, I'm taking the liberty of setting an extended voting period.
>> >> > The
>> >> > vote
>> >> > will be open until Friday, June 8th, at 19:00 UTC (that's 12:00 PDT).
>> >> >
>> >> > It passes with a majority of +1 votes, which must include at least 3
>> >> > +1
>> >> > votes
>> >> > from the PMC.
>> >> >
>> >> > [ ] +1 Release this package as Apache Spark 2.3.1
>> >> > [ ] -1 Do not release this package because ...
>> >> >
>> >> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >> >
>> >> > The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
>> >> > https://github.com/apache/spark/tree/v2.3.1-rc4
>> >> >
>> >> > The release files, including signatures, digests, etc. can be found
>> >> > at:
>> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/
>> >> >
>> >> > Signatures used for Spark RCs can be found in this file:
>> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >> >
>> >> > The staging repository for this release can be found at:
>> >> >
>> >> > https://repository.apache.org/content/repositories/orgapachespark-1272/
>> >> >
>> >> > The documentation corresponding to this release can be found at:
>> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/
>> >> >
>> >> > The list of bug fixes going into 2.3.1 can be found at the following
>> >> > URL:
>> >> > https://issues.apache.org/jira/projects/SPARK/versions/12342432
>> >> >
>> >> > FAQ
>> >> >
>> >> > =========================
>> >> > How can I help test this release?
>> >> > =========================
>> >> >
>> >> > If you are a Spark user, you can help us test this release by taking
>> >> > an existing Spark workload and running on this release candidate,
>> >> > then
>> >> > reporting any regressions.
>> >> >
>> >> > If you're working in PySpark you can set up a virtual env and install
>> >> > the current RC and see if anything important breaks, in the
>> >> > Java/Scala
>> >> > you can add the staging repository to your projects resolvers and
>> >> > test
>> >> > with the RC (make sure to clean up the artifact cache before/after so
>> >> > you don't end up building with a out of date RC going forward).
>> >> >
>> >> > ===========================================
>> >> > What should happen to JIRA tickets still targeting 2.3.1?
>> >> > ===========================================
>> >> >
>> >> > The current list of open tickets targeted at 2.3.1 can be found at:
>> >> > https://s.apache.org/Q3Uo
>> >> >
>> >> > Committers should look at those and triage. Extremely important bug
>> >> > fixes, documentation, and API tweaks that impact compatibility should
>> >> > be worked on immediately. Everything else please retarget to an
>> >> > appropriate release.
>> >> >
>> >> > ==================
>> >> > But my bug isn't fixed?
>> >> > ==================
>> >> >
>> >> > In order to make timely releases, we will typically not hold the
>> >> > release unless the bug in question is a regression from the previous
>> >> > release. That being said, if there is something which is a regression
>> >> > that has not been correctly targeted please ping me or a committer to
>> >> > help target the issue.
>> >> >
>> >> >
>> >> > --
>> >> > Marcelo
>> >>
>> >>
>> >>
>> >> --
>> >> Marcelo
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe e-mail: [hidden email]
>> >>
>> >
>>
>>
>>
>> --
>> Marcelo



--
Marcelo
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Spark 2.3.1 (RC4)

Sean Owen-3
In reply to this post by Marcelo Vanzin
+1 from me with the same comments as in the last RC.

On Fri, Jun 1, 2018 at 5:29 PM Marcelo Vanzin <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.3.1.

Given that I expect at least a few people to be busy with Spark Summit next
week, I'm taking the liberty of setting an extended voting period. The vote
will be open until Friday, June 8th, at 19:00 UTC (that's 12:00 PDT).

It passes with a majority of +1 votes, which must include at least 3 +1 votes
from the PMC.

[ ] +1 Release this package as Apache Spark 2.3.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
https://github.com/apache/spark/tree/v2.3.1-rc4

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1272/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/

The list of bug fixes going into 2.3.1 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12342432

FAQ

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.3.1?
===========================================

The current list of open tickets targeted at 2.3.1 can be found at:
https://s.apache.org/Q3Uo

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Spark 2.3.1 (RC4)

Marcelo Vanzin
In reply to this post by Nicholas Chammas
If you're building your own Spark, definitely try the hadoop-cloud
profile. Then you don't even need to pull anything at runtime,
everything is already packaged with Spark.

On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas
<[hidden email]> wrote:

> pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work for me
> either (even building with -Phadoop-2.7). I guess I’ve been relying on an
> unsupported pattern and will need to figure something else out going forward
> in order to use s3a://.
>
>
> On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin <[hidden email]> wrote:
>>
>> I have personally never tried to include hadoop-aws that way. But at
>> the very least, I'd try to use the same version of Hadoop as the Spark
>> build (2.7.3 IIRC). I don't really expect a different version to work,
>> and if it did in the past it definitely was not by design.
>>
>> On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas
>> <[hidden email]> wrote:
>> > Building with -Phadoop-2.7 didn’t help, and if I remember correctly,
>> > building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0 release,
>> > so
>> > it appears something has changed since then.
>> >
>> > I wasn’t familiar with -Phadoop-cloud, but I can try that.
>> >
>> > My goal here is simply to confirm that this release of Spark works with
>> > hadoop-aws like past releases did, particularly for Flintrock users who
>> > use
>> > Spark with S3A.
>> >
>> > We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop builds
>> > with
>> > every Spark release. If the -hadoop2.7 release build won’t work with
>> > hadoop-aws anymore, are there plans to provide a new build type that
>> > will?
>> >
>> > Apologies if the question is poorly formed. I’m batting a bit outside my
>> > league here. Again, my goal is simply to confirm that I/my users still
>> > have
>> > a way to use s3a://. In the past, that way was simply to call pyspark
>> > --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very similar.
>> > If
>> > that will no longer work, I’m trying to confirm that the change of
>> > behavior
>> > is intentional or acceptable (as a review for the Spark project) and
>> > figure
>> > out what I need to change (as due diligence for Flintrock’s users).
>> >
>> > Nick
>> >
>> >
>> > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin <[hidden email]>
>> > wrote:
>> >>
>> >> Using the hadoop-aws package is probably going to be a little more
>> >> complicated than that. The best bet is to use a custom build of Spark
>> >> that includes it (use -Phadoop-cloud). Otherwise you're probably
>> >> looking at some nasty dependency issues, especially if you end up
>> >> mixing different versions of Hadoop.
>> >>
>> >> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
>> >> <[hidden email]> wrote:
>> >> > I was able to successfully launch a Spark cluster on EC2 at 2.3.1 RC4
>> >> > using
>> >> > Flintrock. However, trying to load the hadoop-aws package gave me
>> >> > some
>> >> > errors.
>> >> >
>> >> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
>> >> >
>> >> > <snipped>
>> >> >
>> >> > :: problems summary ::
>> >> > :::: WARNINGS
>> >> >                 [NOT FOUND  ]
>> >> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar
>> >> >                 [NOT FOUND  ]
>> >> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
>> >> >                 [NOT FOUND  ]
>> >> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
>> >> >                 [NOT FOUND  ]
>> >> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
>> >> >
>> >> > I’d guess I’m probably using the wrong version of hadoop-aws, but I
>> >> > called
>> >> > make-distribution.sh with -Phadoop-2.8 so I’m not sure what else to
>> >> > try.
>> >> >
>> >> > Any quick pointers?
>> >> >
>> >> > Nick
>> >> >
>> >> >
>> >> > On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin <[hidden email]>
>> >> > wrote:
>> >> >>
>> >> >> Starting with my own +1 (binding).
>> >> >>
>> >> >> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin <[hidden email]>
>> >> >> wrote:
>> >> >> > Please vote on releasing the following candidate as Apache Spark
>> >> >> > version
>> >> >> > 2.3.1.
>> >> >> >
>> >> >> > Given that I expect at least a few people to be busy with Spark
>> >> >> > Summit
>> >> >> > next
>> >> >> > week, I'm taking the liberty of setting an extended voting period.
>> >> >> > The
>> >> >> > vote
>> >> >> > will be open until Friday, June 8th, at 19:00 UTC (that's 12:00
>> >> >> > PDT).
>> >> >> >
>> >> >> > It passes with a majority of +1 votes, which must include at least
>> >> >> > 3
>> >> >> > +1
>> >> >> > votes
>> >> >> > from the PMC.
>> >> >> >
>> >> >> > [ ] +1 Release this package as Apache Spark 2.3.1
>> >> >> > [ ] -1 Do not release this package because ...
>> >> >> >
>> >> >> > To learn more about Apache Spark, please see
>> >> >> > http://spark.apache.org/
>> >> >> >
>> >> >> > The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
>> >> >> > https://github.com/apache/spark/tree/v2.3.1-rc4
>> >> >> >
>> >> >> > The release files, including signatures, digests, etc. can be
>> >> >> > found
>> >> >> > at:
>> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/
>> >> >> >
>> >> >> > Signatures used for Spark RCs can be found in this file:
>> >> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >> >> >
>> >> >> > The staging repository for this release can be found at:
>> >> >> >
>> >> >> >
>> >> >> > https://repository.apache.org/content/repositories/orgapachespark-1272/
>> >> >> >
>> >> >> > The documentation corresponding to this release can be found at:
>> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/
>> >> >> >
>> >> >> > The list of bug fixes going into 2.3.1 can be found at the
>> >> >> > following
>> >> >> > URL:
>> >> >> > https://issues.apache.org/jira/projects/SPARK/versions/12342432
>> >> >> >
>> >> >> > FAQ
>> >> >> >
>> >> >> > =========================
>> >> >> > How can I help test this release?
>> >> >> > =========================
>> >> >> >
>> >> >> > If you are a Spark user, you can help us test this release by
>> >> >> > taking
>> >> >> > an existing Spark workload and running on this release candidate,
>> >> >> > then
>> >> >> > reporting any regressions.
>> >> >> >
>> >> >> > If you're working in PySpark you can set up a virtual env and
>> >> >> > install
>> >> >> > the current RC and see if anything important breaks, in the
>> >> >> > Java/Scala
>> >> >> > you can add the staging repository to your projects resolvers and
>> >> >> > test
>> >> >> > with the RC (make sure to clean up the artifact cache before/after
>> >> >> > so
>> >> >> > you don't end up building with a out of date RC going forward).
>> >> >> >
>> >> >> > ===========================================
>> >> >> > What should happen to JIRA tickets still targeting 2.3.1?
>> >> >> > ===========================================
>> >> >> >
>> >> >> > The current list of open tickets targeted at 2.3.1 can be found
>> >> >> > at:
>> >> >> > https://s.apache.org/Q3Uo
>> >> >> >
>> >> >> > Committers should look at those and triage. Extremely important
>> >> >> > bug
>> >> >> > fixes, documentation, and API tweaks that impact compatibility
>> >> >> > should
>> >> >> > be worked on immediately. Everything else please retarget to an
>> >> >> > appropriate release.
>> >> >> >
>> >> >> > ==================
>> >> >> > But my bug isn't fixed?
>> >> >> > ==================
>> >> >> >
>> >> >> > In order to make timely releases, we will typically not hold the
>> >> >> > release unless the bug in question is a regression from the
>> >> >> > previous
>> >> >> > release. That being said, if there is something which is a
>> >> >> > regression
>> >> >> > that has not been correctly targeted please ping me or a committer
>> >> >> > to
>> >> >> > help target the issue.
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Marcelo
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Marcelo
>> >> >>
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe e-mail: [hidden email]
>> >> >>
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Marcelo
>>
>>
>>
>> --
>> Marcelo



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Spark 2.3.1 (RC4)

cloud0fan
+1

On Sun, Jun 3, 2018 at 6:54 AM, Marcelo Vanzin <[hidden email]> wrote:
If you're building your own Spark, definitely try the hadoop-cloud
profile. Then you don't even need to pull anything at runtime,
everything is already packaged with Spark.

On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas
<[hidden email]> wrote:
> pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work for me
> either (even building with -Phadoop-2.7). I guess I’ve been relying on an
> unsupported pattern and will need to figure something else out going forward
> in order to use s3a://.
>
>
> On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin <[hidden email]> wrote:
>>
>> I have personally never tried to include hadoop-aws that way. But at
>> the very least, I'd try to use the same version of Hadoop as the Spark
>> build (2.7.3 IIRC). I don't really expect a different version to work,
>> and if it did in the past it definitely was not by design.
>>
>> On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas
>> <[hidden email]> wrote:
>> > Building with -Phadoop-2.7 didn’t help, and if I remember correctly,
>> > building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0 release,
>> > so
>> > it appears something has changed since then.
>> >
>> > I wasn’t familiar with -Phadoop-cloud, but I can try that.
>> >
>> > My goal here is simply to confirm that this release of Spark works with
>> > hadoop-aws like past releases did, particularly for Flintrock users who
>> > use
>> > Spark with S3A.
>> >
>> > We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop builds
>> > with
>> > every Spark release. If the -hadoop2.7 release build won’t work with
>> > hadoop-aws anymore, are there plans to provide a new build type that
>> > will?
>> >
>> > Apologies if the question is poorly formed. I’m batting a bit outside my
>> > league here. Again, my goal is simply to confirm that I/my users still
>> > have
>> > a way to use s3a://. In the past, that way was simply to call pyspark
>> > --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very similar.
>> > If
>> > that will no longer work, I’m trying to confirm that the change of
>> > behavior
>> > is intentional or acceptable (as a review for the Spark project) and
>> > figure
>> > out what I need to change (as due diligence for Flintrock’s users).
>> >
>> > Nick
>> >
>> >
>> > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin <[hidden email]>
>> > wrote:
>> >>
>> >> Using the hadoop-aws package is probably going to be a little more
>> >> complicated than that. The best bet is to use a custom build of Spark
>> >> that includes it (use -Phadoop-cloud). Otherwise you're probably
>> >> looking at some nasty dependency issues, especially if you end up
>> >> mixing different versions of Hadoop.
>> >>
>> >> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
>> >> <[hidden email]> wrote:
>> >> > I was able to successfully launch a Spark cluster on EC2 at 2.3.1 RC4
>> >> > using
>> >> > Flintrock. However, trying to load the hadoop-aws package gave me
>> >> > some
>> >> > errors.
>> >> >
>> >> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
>> >> >
>> >> > <snipped>
>> >> >
>> >> > :: problems summary ::
>> >> > :::: WARNINGS
>> >> >                 [NOT FOUND  ]
>> >> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar
>> >> >                 [NOT FOUND  ]
>> >> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
>> >> >                 [NOT FOUND  ]
>> >> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
>> >> >                 [NOT FOUND  ]
>> >> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
>> >> >
>> >> > I’d guess I’m probably using the wrong version of hadoop-aws, but I
>> >> > called
>> >> > make-distribution.sh with -Phadoop-2.8 so I’m not sure what else to
>> >> > try.
>> >> >
>> >> > Any quick pointers?
>> >> >
>> >> > Nick
>> >> >
>> >> >
>> >> > On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin <[hidden email]>
>> >> > wrote:
>> >> >>
>> >> >> Starting with my own +1 (binding).
>> >> >>
>> >> >> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin <[hidden email]>
>> >> >> wrote:
>> >> >> > Please vote on releasing the following candidate as Apache Spark
>> >> >> > version
>> >> >> > 2.3.1.
>> >> >> >
>> >> >> > Given that I expect at least a few people to be busy with Spark
>> >> >> > Summit
>> >> >> > next
>> >> >> > week, I'm taking the liberty of setting an extended voting period.
>> >> >> > The
>> >> >> > vote
>> >> >> > will be open until Friday, June 8th, at 19:00 UTC (that's 12:00
>> >> >> > PDT).
>> >> >> >
>> >> >> > It passes with a majority of +1 votes, which must include at least
>> >> >> > 3
>> >> >> > +1
>> >> >> > votes
>> >> >> > from the PMC.
>> >> >> >
>> >> >> > [ ] +1 Release this package as Apache Spark 2.3.1
>> >> >> > [ ] -1 Do not release this package because ...
>> >> >> >
>> >> >> > To learn more about Apache Spark, please see
>> >> >> > http://spark.apache.org/
>> >> >> >
>> >> >> > The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
>> >> >> > https://github.com/apache/spark/tree/v2.3.1-rc4
>> >> >> >
>> >> >> > The release files, including signatures, digests, etc. can be
>> >> >> > found
>> >> >> > at:
>> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/
>> >> >> >
>> >> >> > Signatures used for Spark RCs can be found in this file:
>> >> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >> >> >
>> >> >> > The staging repository for this release can be found at:
>> >> >> >
>> >> >> >
>> >> >> > https://repository.apache.org/content/repositories/orgapachespark-1272/
>> >> >> >
>> >> >> > The documentation corresponding to this release can be found at:
>> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/
>> >> >> >
>> >> >> > The list of bug fixes going into 2.3.1 can be found at the
>> >> >> > following
>> >> >> > URL:
>> >> >> > https://issues.apache.org/jira/projects/SPARK/versions/12342432
>> >> >> >
>> >> >> > FAQ
>> >> >> >
>> >> >> > =========================
>> >> >> > How can I help test this release?
>> >> >> > =========================
>> >> >> >
>> >> >> > If you are a Spark user, you can help us test this release by
>> >> >> > taking
>> >> >> > an existing Spark workload and running on this release candidate,
>> >> >> > then
>> >> >> > reporting any regressions.
>> >> >> >
>> >> >> > If you're working in PySpark you can set up a virtual env and
>> >> >> > install
>> >> >> > the current RC and see if anything important breaks, in the
>> >> >> > Java/Scala
>> >> >> > you can add the staging repository to your projects resolvers and
>> >> >> > test
>> >> >> > with the RC (make sure to clean up the artifact cache before/after
>> >> >> > so
>> >> >> > you don't end up building with a out of date RC going forward).
>> >> >> >
>> >> >> > ===========================================
>> >> >> > What should happen to JIRA tickets still targeting 2.3.1?
>> >> >> > ===========================================
>> >> >> >
>> >> >> > The current list of open tickets targeted at 2.3.1 can be found
>> >> >> > at:
>> >> >> > https://s.apache.org/Q3Uo
>> >> >> >
>> >> >> > Committers should look at those and triage. Extremely important
>> >> >> > bug
>> >> >> > fixes, documentation, and API tweaks that impact compatibility
>> >> >> > should
>> >> >> > be worked on immediately. Everything else please retarget to an
>> >> >> > appropriate release.
>> >> >> >
>> >> >> > ==================
>> >> >> > But my bug isn't fixed?
>> >> >> > ==================
>> >> >> >
>> >> >> > In order to make timely releases, we will typically not hold the
>> >> >> > release unless the bug in question is a regression from the
>> >> >> > previous
>> >> >> > release. That being said, if there is something which is a
>> >> >> > regression
>> >> >> > that has not been correctly targeted please ping me or a committer
>> >> >> > to
>> >> >> > help target the issue.
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Marcelo
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Marcelo
>> >> >>
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe e-mail: [hidden email]
>> >> >>
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Marcelo
>>
>>
>>
>> --
>> Marcelo



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Spark 2.3.1 (RC4)

Nicholas Chammas
In reply to this post by Marcelo Vanzin
I'll give that a try, but I'll still have to figure out what to do if none of the release builds work with hadoop-aws, since Flintrock deploys Spark release builds to set up a cluster. Building Spark is slow, so we only do it if the user specifically requests a Spark version by git hash. (This is basically how spark-ec2 did things, too.)

On Sat, Jun 2, 2018 at 6:54 PM Marcelo Vanzin <[hidden email]> wrote:
If you're building your own Spark, definitely try the hadoop-cloud
profile. Then you don't even need to pull anything at runtime,
everything is already packaged with Spark.

On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas
<[hidden email]> wrote:
> pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work for me
> either (even building with -Phadoop-2.7). I guess I’ve been relying on an
> unsupported pattern and will need to figure something else out going forward
> in order to use s3a://.
>
>
> On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin <[hidden email]> wrote:
>>
>> I have personally never tried to include hadoop-aws that way. But at
>> the very least, I'd try to use the same version of Hadoop as the Spark
>> build (2.7.3 IIRC). I don't really expect a different version to work,
>> and if it did in the past it definitely was not by design.
>>
>> On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas
>> <[hidden email]> wrote:
>> > Building with -Phadoop-2.7 didn’t help, and if I remember correctly,
>> > building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0 release,
>> > so
>> > it appears something has changed since then.
>> >
>> > I wasn’t familiar with -Phadoop-cloud, but I can try that.
>> >
>> > My goal here is simply to confirm that this release of Spark works with
>> > hadoop-aws like past releases did, particularly for Flintrock users who
>> > use
>> > Spark with S3A.
>> >
>> > We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop builds
>> > with
>> > every Spark release. If the -hadoop2.7 release build won’t work with
>> > hadoop-aws anymore, are there plans to provide a new build type that
>> > will?
>> >
>> > Apologies if the question is poorly formed. I’m batting a bit outside my
>> > league here. Again, my goal is simply to confirm that I/my users still
>> > have
>> > a way to use s3a://. In the past, that way was simply to call pyspark
>> > --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very similar.
>> > If
>> > that will no longer work, I’m trying to confirm that the change of
>> > behavior
>> > is intentional or acceptable (as a review for the Spark project) and
>> > figure
>> > out what I need to change (as due diligence for Flintrock’s users).
>> >
>> > Nick
>> >
>> >
>> > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin <[hidden email]>
>> > wrote:
>> >>
>> >> Using the hadoop-aws package is probably going to be a little more
>> >> complicated than that. The best bet is to use a custom build of Spark
>> >> that includes it (use -Phadoop-cloud). Otherwise you're probably
>> >> looking at some nasty dependency issues, especially if you end up
>> >> mixing different versions of Hadoop.
>> >>
>> >> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
>> >> <[hidden email]> wrote:
>> >> > I was able to successfully launch a Spark cluster on EC2 at 2.3.1 RC4
>> >> > using
>> >> > Flintrock. However, trying to load the hadoop-aws package gave me
>> >> > some
>> >> > errors.
>> >> >
>> >> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
>> >> >
>> >> > <snipped>
>> >> >
>> >> > :: problems summary ::
>> >> > :::: WARNINGS
>> >> >                 [NOT FOUND  ]
>> >> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar
>> >> >                 [NOT FOUND  ]
>> >> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
>> >> >                 [NOT FOUND  ]
>> >> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
>> >> >                 [NOT FOUND  ]
>> >> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
>> >> >
>> >> > I’d guess I’m probably using the wrong version of hadoop-aws, but I
>> >> > called
>> >> > make-distribution.sh with -Phadoop-2.8 so I’m not sure what else to
>> >> > try.
>> >> >
>> >> > Any quick pointers?
>> >> >
>> >> > Nick
>> >> >
>> >> >
>> >> > On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin <[hidden email]>
>> >> > wrote:
>> >> >>
>> >> >> Starting with my own +1 (binding).
>> >> >>
>> >> >> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin <[hidden email]>
>> >> >> wrote:
>> >> >> > Please vote on releasing the following candidate as Apache Spark
>> >> >> > version
>> >> >> > 2.3.1.
>> >> >> >
>> >> >> > Given that I expect at least a few people to be busy with Spark
>> >> >> > Summit
>> >> >> > next
>> >> >> > week, I'm taking the liberty of setting an extended voting period.
>> >> >> > The
>> >> >> > vote
>> >> >> > will be open until Friday, June 8th, at 19:00 UTC (that's 12:00
>> >> >> > PDT).
>> >> >> >
>> >> >> > It passes with a majority of +1 votes, which must include at least
>> >> >> > 3
>> >> >> > +1
>> >> >> > votes
>> >> >> > from the PMC.
>> >> >> >
>> >> >> > [ ] +1 Release this package as Apache Spark 2.3.1
>> >> >> > [ ] -1 Do not release this package because ...
>> >> >> >
>> >> >> > To learn more about Apache Spark, please see
>> >> >> > http://spark.apache.org/
>> >> >> >
>> >> >> > The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
>> >> >> > https://github.com/apache/spark/tree/v2.3.1-rc4
>> >> >> >
>> >> >> > The release files, including signatures, digests, etc. can be
>> >> >> > found
>> >> >> > at:
>> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/
>> >> >> >
>> >> >> > Signatures used for Spark RCs can be found in this file:
>> >> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >> >> >
>> >> >> > The staging repository for this release can be found at:
>> >> >> >
>> >> >> >
>> >> >> > https://repository.apache.org/content/repositories/orgapachespark-1272/
>> >> >> >
>> >> >> > The documentation corresponding to this release can be found at:
>> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/
>> >> >> >
>> >> >> > The list of bug fixes going into 2.3.1 can be found at the
>> >> >> > following
>> >> >> > URL:
>> >> >> > https://issues.apache.org/jira/projects/SPARK/versions/12342432
>> >> >> >
>> >> >> > FAQ
>> >> >> >
>> >> >> > =========================
>> >> >> > How can I help test this release?
>> >> >> > =========================
>> >> >> >
>> >> >> > If you are a Spark user, you can help us test this release by
>> >> >> > taking
>> >> >> > an existing Spark workload and running on this release candidate,
>> >> >> > then
>> >> >> > reporting any regressions.
>> >> >> >
>> >> >> > If you're working in PySpark you can set up a virtual env and
>> >> >> > install
>> >> >> > the current RC and see if anything important breaks, in the
>> >> >> > Java/Scala
>> >> >> > you can add the staging repository to your projects resolvers and
>> >> >> > test
>> >> >> > with the RC (make sure to clean up the artifact cache before/after
>> >> >> > so
>> >> >> > you don't end up building with a out of date RC going forward).
>> >> >> >
>> >> >> > ===========================================
>> >> >> > What should happen to JIRA tickets still targeting 2.3.1?
>> >> >> > ===========================================
>> >> >> >
>> >> >> > The current list of open tickets targeted at 2.3.1 can be found
>> >> >> > at:
>> >> >> > https://s.apache.org/Q3Uo
>> >> >> >
>> >> >> > Committers should look at those and triage. Extremely important
>> >> >> > bug
>> >> >> > fixes, documentation, and API tweaks that impact compatibility
>> >> >> > should
>> >> >> > be worked on immediately. Everything else please retarget to an
>> >> >> > appropriate release.
>> >> >> >
>> >> >> > ==================
>> >> >> > But my bug isn't fixed?
>> >> >> > ==================
>> >> >> >
>> >> >> > In order to make timely releases, we will typically not hold the
>> >> >> > release unless the bug in question is a regression from the
>> >> >> > previous
>> >> >> > release. That being said, if there is something which is a
>> >> >> > regression
>> >> >> > that has not been correctly targeted please ping me or a committer
>> >> >> > to
>> >> >> > help target the issue.
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Marcelo
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Marcelo
>> >> >>
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe e-mail: [hidden email]
>> >> >>
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Marcelo
>>
>>
>>
>> --
>> Marcelo



--
Marcelo
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Spark 2.3.1 (RC4)

Denny Lee
+1

On Sat, Jun 2, 2018 at 4:53 PM Nicholas Chammas <[hidden email]> wrote:
I'll give that a try, but I'll still have to figure out what to do if none of the release builds work with hadoop-aws, since Flintrock deploys Spark release builds to set up a cluster. Building Spark is slow, so we only do it if the user specifically requests a Spark version by git hash. (This is basically how spark-ec2 did things, too.)


On Sat, Jun 2, 2018 at 6:54 PM Marcelo Vanzin <[hidden email]> wrote:
If you're building your own Spark, definitely try the hadoop-cloud
profile. Then you don't even need to pull anything at runtime,
everything is already packaged with Spark.

On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas
<[hidden email]> wrote:
> pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work for me
> either (even building with -Phadoop-2.7). I guess I’ve been relying on an
> unsupported pattern and will need to figure something else out going forward
> in order to use s3a://.
>
>
> On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin <[hidden email]> wrote:
>>
>> I have personally never tried to include hadoop-aws that way. But at
>> the very least, I'd try to use the same version of Hadoop as the Spark
>> build (2.7.3 IIRC). I don't really expect a different version to work,
>> and if it did in the past it definitely was not by design.
>>
>> On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas
>> <[hidden email]> wrote:
>> > Building with -Phadoop-2.7 didn’t help, and if I remember correctly,
>> > building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0 release,
>> > so
>> > it appears something has changed since then.
>> >
>> > I wasn’t familiar with -Phadoop-cloud, but I can try that.
>> >
>> > My goal here is simply to confirm that this release of Spark works with
>> > hadoop-aws like past releases did, particularly for Flintrock users who
>> > use
>> > Spark with S3A.
>> >
>> > We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop builds
>> > with
>> > every Spark release. If the -hadoop2.7 release build won’t work with
>> > hadoop-aws anymore, are there plans to provide a new build type that
>> > will?
>> >
>> > Apologies if the question is poorly formed. I’m batting a bit outside my
>> > league here. Again, my goal is simply to confirm that I/my users still
>> > have
>> > a way to use s3a://. In the past, that way was simply to call pyspark
>> > --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very similar.
>> > If
>> > that will no longer work, I’m trying to confirm that the change of
>> > behavior
>> > is intentional or acceptable (as a review for the Spark project) and
>> > figure
>> > out what I need to change (as due diligence for Flintrock’s users).
>> >
>> > Nick
>> >
>> >
>> > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin <[hidden email]>
>> > wrote:
>> >>
>> >> Using the hadoop-aws package is probably going to be a little more
>> >> complicated than that. The best bet is to use a custom build of Spark
>> >> that includes it (use -Phadoop-cloud). Otherwise you're probably
>> >> looking at some nasty dependency issues, especially if you end up
>> >> mixing different versions of Hadoop.
>> >>
>> >> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
>> >> <[hidden email]> wrote:
>> >> > I was able to successfully launch a Spark cluster on EC2 at 2.3.1 RC4
>> >> > using
>> >> > Flintrock. However, trying to load the hadoop-aws package gave me
>> >> > some
>> >> > errors.
>> >> >
>> >> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
>> >> >
>> >> > <snipped>
>> >> >
>> >> > :: problems summary ::
>> >> > :::: WARNINGS
>> >> >                 [NOT FOUND  ]
>> >> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar
>> >> >                 [NOT FOUND  ]
>> >> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
>> >> >                 [NOT FOUND  ]
>> >> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
>> >> >                 [NOT FOUND  ]
>> >> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
>> >> >
>> >> > I’d guess I’m probably using the wrong version of hadoop-aws, but I
>> >> > called
>> >> > make-distribution.sh with -Phadoop-2.8 so I’m not sure what else to
>> >> > try.
>> >> >
>> >> > Any quick pointers?
>> >> >
>> >> > Nick
>> >> >
>> >> >
>> >> > On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin <[hidden email]>
>> >> > wrote:
>> >> >>
>> >> >> Starting with my own +1 (binding).
>> >> >>
>> >> >> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin <[hidden email]>
>> >> >> wrote:
>> >> >> > Please vote on releasing the following candidate as Apache Spark
>> >> >> > version
>> >> >> > 2.3.1.
>> >> >> >
>> >> >> > Given that I expect at least a few people to be busy with Spark
>> >> >> > Summit
>> >> >> > next
>> >> >> > week, I'm taking the liberty of setting an extended voting period.
>> >> >> > The
>> >> >> > vote
>> >> >> > will be open until Friday, June 8th, at 19:00 UTC (that's 12:00
>> >> >> > PDT).
>> >> >> >
>> >> >> > It passes with a majority of +1 votes, which must include at least
>> >> >> > 3
>> >> >> > +1
>> >> >> > votes
>> >> >> > from the PMC.
>> >> >> >
>> >> >> > [ ] +1 Release this package as Apache Spark 2.3.1
>> >> >> > [ ] -1 Do not release this package because ...
>> >> >> >
>> >> >> > To learn more about Apache Spark, please see
>> >> >> > http://spark.apache.org/
>> >> >> >
>> >> >> > The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
>> >> >> > https://github.com/apache/spark/tree/v2.3.1-rc4
>> >> >> >
>> >> >> > The release files, including signatures, digests, etc. can be
>> >> >> > found
>> >> >> > at:
>> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/
>> >> >> >
>> >> >> > Signatures used for Spark RCs can be found in this file:
>> >> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >> >> >
>> >> >> > The staging repository for this release can be found at:
>> >> >> >
>> >> >> >
>> >> >> > https://repository.apache.org/content/repositories/orgapachespark-1272/
>> >> >> >
>> >> >> > The documentation corresponding to this release can be found at:
>> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/
>> >> >> >
>> >> >> > The list of bug fixes going into 2.3.1 can be found at the
>> >> >> > following
>> >> >> > URL:
>> >> >> > https://issues.apache.org/jira/projects/SPARK/versions/12342432
>> >> >> >
>> >> >> > FAQ
>> >> >> >
>> >> >> > =========================
>> >> >> > How can I help test this release?
>> >> >> > =========================
>> >> >> >
>> >> >> > If you are a Spark user, you can help us test this release by
>> >> >> > taking
>> >> >> > an existing Spark workload and running on this release candidate,
>> >> >> > then
>> >> >> > reporting any regressions.
>> >> >> >
>> >> >> > If you're working in PySpark you can set up a virtual env and
>> >> >> > install
>> >> >> > the current RC and see if anything important breaks, in the
>> >> >> > Java/Scala
>> >> >> > you can add the staging repository to your projects resolvers and
>> >> >> > test
>> >> >> > with the RC (make sure to clean up the artifact cache before/after
>> >> >> > so
>> >> >> > you don't end up building with a out of date RC going forward).
>> >> >> >
>> >> >> > ===========================================
>> >> >> > What should happen to JIRA tickets still targeting 2.3.1?
>> >> >> > ===========================================
>> >> >> >
>> >> >> > The current list of open tickets targeted at 2.3.1 can be found
>> >> >> > at:
>> >> >> > https://s.apache.org/Q3Uo
>> >> >> >
>> >> >> > Committers should look at those and triage. Extremely important
>> >> >> > bug
>> >> >> > fixes, documentation, and API tweaks that impact compatibility
>> >> >> > should
>> >> >> > be worked on immediately. Everything else please retarget to an
>> >> >> > appropriate release.
>> >> >> >
>> >> >> > ==================
>> >> >> > But my bug isn't fixed?
>> >> >> > ==================
>> >> >> >
>> >> >> > In order to make timely releases, we will typically not hold the
>> >> >> > release unless the bug in question is a regression from the
>> >> >> > previous
>> >> >> > release. That being said, if there is something which is a
>> >> >> > regression
>> >> >> > that has not been correctly targeted please ping me or a committer
>> >> >> > to
>> >> >> > help target the issue.
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Marcelo
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Marcelo
>> >> >>
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe e-mail: [hidden email]
>> >> >>
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Marcelo
>>
>>
>>
>> --
>> Marcelo



--
Marcelo
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Spark 2.3.1 (RC4)

Dongjoon Hyun-2
+1

Bests,
Dongjoon.

On Sat, Jun 2, 2018 at 8:09 PM, Denny Lee <[hidden email]> wrote:
+1

On Sat, Jun 2, 2018 at 4:53 PM Nicholas Chammas <[hidden email]> wrote:
I'll give that a try, but I'll still have to figure out what to do if none of the release builds work with hadoop-aws, since Flintrock deploys Spark release builds to set up a cluster. Building Spark is slow, so we only do it if the user specifically requests a Spark version by git hash. (This is basically how spark-ec2 did things, too.)


On Sat, Jun 2, 2018 at 6:54 PM Marcelo Vanzin <[hidden email]> wrote:
If you're building your own Spark, definitely try the hadoop-cloud
profile. Then you don't even need to pull anything at runtime,
everything is already packaged with Spark.

On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas
<[hidden email]> wrote:
> pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work for me
> either (even building with -Phadoop-2.7). I guess I’ve been relying on an
> unsupported pattern and will need to figure something else out going forward
> in order to use s3a://.
>
>
> On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin <[hidden email]> wrote:
>>
>> I have personally never tried to include hadoop-aws that way. But at
>> the very least, I'd try to use the same version of Hadoop as the Spark
>> build (2.7.3 IIRC). I don't really expect a different version to work,
>> and if it did in the past it definitely was not by design.
>>
>> On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas
>> <[hidden email]> wrote:
>> > Building with -Phadoop-2.7 didn’t help, and if I remember correctly,
>> > building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0 release,
>> > so
>> > it appears something has changed since then.
>> >
>> > I wasn’t familiar with -Phadoop-cloud, but I can try that.
>> >
>> > My goal here is simply to confirm that this release of Spark works with
>> > hadoop-aws like past releases did, particularly for Flintrock users who
>> > use
>> > Spark with S3A.
>> >
>> > We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop builds
>> > with
>> > every Spark release. If the -hadoop2.7 release build won’t work with
>> > hadoop-aws anymore, are there plans to provide a new build type that
>> > will?
>> >
>> > Apologies if the question is poorly formed. I’m batting a bit outside my
>> > league here. Again, my goal is simply to confirm that I/my users still
>> > have
>> > a way to use s3a://. In the past, that way was simply to call pyspark
>> > --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very similar.
>> > If
>> > that will no longer work, I’m trying to confirm that the change of
>> > behavior
>> > is intentional or acceptable (as a review for the Spark project) and
>> > figure
>> > out what I need to change (as due diligence for Flintrock’s users).
>> >
>> > Nick
>> >
>> >
>> > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin <[hidden email]>
>> > wrote:
>> >>
>> >> Using the hadoop-aws package is probably going to be a little more
>> >> complicated than that. The best bet is to use a custom build of Spark
>> >> that includes it (use -Phadoop-cloud). Otherwise you're probably
>> >> looking at some nasty dependency issues, especially if you end up
>> >> mixing different versions of Hadoop.
>> >>
>> >> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
>> >> <[hidden email]> wrote:
>> >> > I was able to successfully launch a Spark cluster on EC2 at 2.3.1 RC4
>> >> > using
>> >> > Flintrock. However, trying to load the hadoop-aws package gave me
>> >> > some
>> >> > errors.
>> >> >
>> >> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
>> >> >
>> >> > <snipped>
>> >> >
>> >> > :: problems summary ::
>> >> > :::: WARNINGS
>> >> >                 [NOT FOUND  ]
>> >> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar
>> >> >                 [NOT FOUND  ]
>> >> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
>> >> >                 [NOT FOUND  ]
>> >> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
>> >> >                 [NOT FOUND  ]
>> >> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
>> >> >
>> >> > I’d guess I’m probably using the wrong version of hadoop-aws, but I
>> >> > called
>> >> > make-distribution.sh with -Phadoop-2.8 so I’m not sure what else to
>> >> > try.
>> >> >
>> >> > Any quick pointers?
>> >> >
>> >> > Nick
>> >> >
>> >> >
>> >> > On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin <[hidden email]>
>> >> > wrote:
>> >> >>
>> >> >> Starting with my own +1 (binding).
>> >> >>
>> >> >> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin <[hidden email]>
>> >> >> wrote:
>> >> >> > Please vote on releasing the following candidate as Apache Spark
>> >> >> > version
>> >> >> > 2.3.1.
>> >> >> >
>> >> >> > Given that I expect at least a few people to be busy with Spark
>> >> >> > Summit
>> >> >> > next
>> >> >> > week, I'm taking the liberty of setting an extended voting period.
>> >> >> > The
>> >> >> > vote
>> >> >> > will be open until Friday, June 8th, at 19:00 UTC (that's 12:00
>> >> >> > PDT).
>> >> >> >
>> >> >> > It passes with a majority of +1 votes, which must include at least
>> >> >> > 3
>> >> >> > +1
>> >> >> > votes
>> >> >> > from the PMC.
>> >> >> >
>> >> >> > [ ] +1 Release this package as Apache Spark 2.3.1
>> >> >> > [ ] -1 Do not release this package because ...
>> >> >> >
>> >> >> > To learn more about Apache Spark, please see
>> >> >> > http://spark.apache.org/
>> >> >> >
>> >> >> > The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
>> >> >> > https://github.com/apache/spark/tree/v2.3.1-rc4
>> >> >> >
>> >> >> > The release files, including signatures, digests, etc. can be
>> >> >> > found
>> >> >> > at:
>> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/
>> >> >> >
>> >> >> > Signatures used for Spark RCs can be found in this file:
>> >> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >> >> >
>> >> >> > The staging repository for this release can be found at:
>> >> >> >
>> >> >> >
>> >> >> > https://repository.apache.org/content/repositories/orgapachespark-1272/
>> >> >> >
>> >> >> > The documentation corresponding to this release can be found at:
>> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/
>> >> >> >
>> >> >> > The list of bug fixes going into 2.3.1 can be found at the
>> >> >> > following
>> >> >> > URL:
>> >> >> > https://issues.apache.org/jira/projects/SPARK/versions/12342432
>> >> >> >
>> >> >> > FAQ
>> >> >> >
>> >> >> > =========================
>> >> >> > How can I help test this release?
>> >> >> > =========================
>> >> >> >
>> >> >> > If you are a Spark user, you can help us test this release by
>> >> >> > taking
>> >> >> > an existing Spark workload and running on this release candidate,
>> >> >> > then
>> >> >> > reporting any regressions.
>> >> >> >
>> >> >> > If you're working in PySpark you can set up a virtual env and
>> >> >> > install
>> >> >> > the current RC and see if anything important breaks, in the
>> >> >> > Java/Scala
>> >> >> > you can add the staging repository to your projects resolvers and
>> >> >> > test
>> >> >> > with the RC (make sure to clean up the artifact cache before/after
>> >> >> > so
>> >> >> > you don't end up building with a out of date RC going forward).
>> >> >> >
>> >> >> > ===========================================
>> >> >> > What should happen to JIRA tickets still targeting 2.3.1?
>> >> >> > ===========================================
>> >> >> >
>> >> >> > The current list of open tickets targeted at 2.3.1 can be found
>> >> >> > at:
>> >> >> > https://s.apache.org/Q3Uo
>> >> >> >
>> >> >> > Committers should look at those and triage. Extremely important
>> >> >> > bug
>> >> >> > fixes, documentation, and API tweaks that impact compatibility
>> >> >> > should
>> >> >> > be worked on immediately. Everything else please retarget to an
>> >> >> > appropriate release.
>> >> >> >
>> >> >> > ==================
>> >> >> > But my bug isn't fixed?
>> >> >> > ==================
>> >> >> >
>> >> >> > In order to make timely releases, we will typically not hold the
>> >> >> > release unless the bug in question is a regression from the
>> >> >> > previous
>> >> >> > release. That being said, if there is something which is a
>> >> >> > regression
>> >> >> > that has not been correctly targeted please ping me or a committer
>> >> >> > to
>> >> >> > help target the issue.
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Marcelo
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Marcelo
>> >> >>
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe e-mail: [hidden email]
>> >> >>
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Marcelo
>>
>>
>>
>> --
>> Marcelo



--
Marcelo

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Spark 2.3.1 (RC4)

Ricardo Almeida-2
+1 (non-binding) 

On 3 June 2018 at 09:23, Dongjoon Hyun <[hidden email]> wrote:
+1

Bests,
Dongjoon.

On Sat, Jun 2, 2018 at 8:09 PM, Denny Lee <[hidden email]> wrote:
+1

On Sat, Jun 2, 2018 at 4:53 PM Nicholas Chammas <[hidden email]> wrote:
I'll give that a try, but I'll still have to figure out what to do if none of the release builds work with hadoop-aws, since Flintrock deploys Spark release builds to set up a cluster. Building Spark is slow, so we only do it if the user specifically requests a Spark version by git hash. (This is basically how spark-ec2 did things, too.)


On Sat, Jun 2, 2018 at 6:54 PM Marcelo Vanzin <[hidden email]> wrote:
If you're building your own Spark, definitely try the hadoop-cloud
profile. Then you don't even need to pull anything at runtime,
everything is already packaged with Spark.

On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas
<[hidden email]> wrote:
> pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work for me
> either (even building with -Phadoop-2.7). I guess I’ve been relying on an
> unsupported pattern and will need to figure something else out going forward
> in order to use s3a://.
>
>
> On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin <[hidden email]> wrote:
>>
>> I have personally never tried to include hadoop-aws that way. But at
>> the very least, I'd try to use the same version of Hadoop as the Spark
>> build (2.7.3 IIRC). I don't really expect a different version to work,
>> and if it did in the past it definitely was not by design.
>>
>> On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas
>> <[hidden email]> wrote:
>> > Building with -Phadoop-2.7 didn’t help, and if I remember correctly,
>> > building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0 release,
>> > so
>> > it appears something has changed since then.
>> >
>> > I wasn’t familiar with -Phadoop-cloud, but I can try that.
>> >
>> > My goal here is simply to confirm that this release of Spark works with
>> > hadoop-aws like past releases did, particularly for Flintrock users who
>> > use
>> > Spark with S3A.
>> >
>> > We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop builds
>> > with
>> > every Spark release. If the -hadoop2.7 release build won’t work with
>> > hadoop-aws anymore, are there plans to provide a new build type that
>> > will?
>> >
>> > Apologies if the question is poorly formed. I’m batting a bit outside my
>> > league here. Again, my goal is simply to confirm that I/my users still
>> > have
>> > a way to use s3a://. In the past, that way was simply to call pyspark
>> > --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very similar.
>> > If
>> > that will no longer work, I’m trying to confirm that the change of
>> > behavior
>> > is intentional or acceptable (as a review for the Spark project) and
>> > figure
>> > out what I need to change (as due diligence for Flintrock’s users).
>> >
>> > Nick
>> >
>> >
>> > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin <[hidden email]>
>> > wrote:
>> >>
>> >> Using the hadoop-aws package is probably going to be a little more
>> >> complicated than that. The best bet is to use a custom build of Spark
>> >> that includes it (use -Phadoop-cloud). Otherwise you're probably
>> >> looking at some nasty dependency issues, especially if you end up
>> >> mixing different versions of Hadoop.
>> >>
>> >> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
>> >> <[hidden email]> wrote:
>> >> > I was able to successfully launch a Spark cluster on EC2 at 2.3.1 RC4
>> >> > using
>> >> > Flintrock. However, trying to load the hadoop-aws package gave me
>> >> > some
>> >> > errors.
>> >> >
>> >> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
>> >> >
>> >> > <snipped>
>> >> >
>> >> > :: problems summary ::
>> >> > :::: WARNINGS
>> >> >                 [NOT FOUND  ]
>> >> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar
>> >> >                 [NOT FOUND  ]
>> >> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
>> >> >                 [NOT FOUND  ]
>> >> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
>> >> >                 [NOT FOUND  ]
>> >> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
>> >> >
>> >> > I’d guess I’m probably using the wrong version of hadoop-aws, but I
>> >> > called
>> >> > make-distribution.sh with -Phadoop-2.8 so I’m not sure what else to
>> >> > try.
>> >> >
>> >> > Any quick pointers?
>> >> >
>> >> > Nick
>> >> >
>> >> >
>> >> > On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin <[hidden email]>
>> >> > wrote:
>> >> >>
>> >> >> Starting with my own +1 (binding).
>> >> >>
>> >> >> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin <[hidden email]>
>> >> >> wrote:
>> >> >> > Please vote on releasing the following candidate as Apache Spark
>> >> >> > version
>> >> >> > 2.3.1.
>> >> >> >
>> >> >> > Given that I expect at least a few people to be busy with Spark
>> >> >> > Summit
>> >> >> > next
>> >> >> > week, I'm taking the liberty of setting an extended voting period.
>> >> >> > The
>> >> >> > vote
>> >> >> > will be open until Friday, June 8th, at 19:00 UTC (that's 12:00
>> >> >> > PDT).
>> >> >> >
>> >> >> > It passes with a majority of +1 votes, which must include at least
>> >> >> > 3
>> >> >> > +1
>> >> >> > votes
>> >> >> > from the PMC.
>> >> >> >
>> >> >> > [ ] +1 Release this package as Apache Spark 2.3.1
>> >> >> > [ ] -1 Do not release this package because ...
>> >> >> >
>> >> >> > To learn more about Apache Spark, please see
>> >> >> > http://spark.apache.org/
>> >> >> >
>> >> >> > The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
>> >> >> > https://github.com/apache/spark/tree/v2.3.1-rc4
>> >> >> >
>> >> >> > The release files, including signatures, digests, etc. can be
>> >> >> > found
>> >> >> > at:
>> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/
>> >> >> >
>> >> >> > Signatures used for Spark RCs can be found in this file:
>> >> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >> >> >
>> >> >> > The staging repository for this release can be found at:
>> >> >> >
>> >> >> >
>> >> >> > https://repository.apache.org/content/repositories/orgapachespark-1272/
>> >> >> >
>> >> >> > The documentation corresponding to this release can be found at:
>> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/
>> >> >> >
>> >> >> > The list of bug fixes going into 2.3.1 can be found at the
>> >> >> > following
>> >> >> > URL:
>> >> >> > https://issues.apache.org/jira/projects/SPARK/versions/12342432
>> >> >> >
>> >> >> > FAQ
>> >> >> >
>> >> >> > =========================
>> >> >> > How can I help test this release?
>> >> >> > =========================
>> >> >> >
>> >> >> > If you are a Spark user, you can help us test this release by
>> >> >> > taking
>> >> >> > an existing Spark workload and running on this release candidate,
>> >> >> > then
>> >> >> > reporting any regressions.
>> >> >> >
>> >> >> > If you're working in PySpark you can set up a virtual env and
>> >> >> > install
>> >> >> > the current RC and see if anything important breaks, in the
>> >> >> > Java/Scala
>> >> >> > you can add the staging repository to your projects resolvers and
>> >> >> > test
>> >> >> > with the RC (make sure to clean up the artifact cache before/after
>> >> >> > so
>> >> >> > you don't end up building with a out of date RC going forward).
>> >> >> >
>> >> >> > ===========================================
>> >> >> > What should happen to JIRA tickets still targeting 2.3.1?
>> >> >> > ===========================================
>> >> >> >
>> >> >> > The current list of open tickets targeted at 2.3.1 can be found
>> >> >> > at:
>> >> >> > https://s.apache.org/Q3Uo
>> >> >> >
>> >> >> > Committers should look at those and triage. Extremely important
>> >> >> > bug
>> >> >> > fixes, documentation, and API tweaks that impact compatibility
>> >> >> > should
>> >> >> > be worked on immediately. Everything else please retarget to an
>> >> >> > appropriate release.
>> >> >> >
>> >> >> > ==================
>> >> >> > But my bug isn't fixed?
>> >> >> > ==================
>> >> >> >
>> >> >> > In order to make timely releases, we will typically not hold the
>> >> >> > release unless the bug in question is a regression from the
>> >> >> > previous
>> >> >> > release. That being said, if there is something which is a
>> >> >> > regression
>> >> >> > that has not been correctly targeted please ping me or a committer
>> >> >> > to
>> >> >> > help target the issue.
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Marcelo
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Marcelo
>> >> >>
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe e-mail: [hidden email]
>> >> >>
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Marcelo
>>
>>
>>
>> --
>> Marcelo



--
Marcelo


Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Spark 2.3.1 (RC4)

Hyukjin Kwon
+1

2018년 6월 3일 (일) 오후 9:25, Ricardo Almeida <[hidden email]>님이 작성:
+1 (non-binding) 

On 3 June 2018 at 09:23, Dongjoon Hyun <[hidden email]> wrote:
+1

Bests,
Dongjoon.

On Sat, Jun 2, 2018 at 8:09 PM, Denny Lee <[hidden email]> wrote:
+1

On Sat, Jun 2, 2018 at 4:53 PM Nicholas Chammas <[hidden email]> wrote:
I'll give that a try, but I'll still have to figure out what to do if none of the release builds work with hadoop-aws, since Flintrock deploys Spark release builds to set up a cluster. Building Spark is slow, so we only do it if the user specifically requests a Spark version by git hash. (This is basically how spark-ec2 did things, too.)


On Sat, Jun 2, 2018 at 6:54 PM Marcelo Vanzin <[hidden email]> wrote:
If you're building your own Spark, definitely try the hadoop-cloud
profile. Then you don't even need to pull anything at runtime,
everything is already packaged with Spark.

On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas
<[hidden email]> wrote:
> pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work for me
> either (even building with -Phadoop-2.7). I guess I’ve been relying on an
> unsupported pattern and will need to figure something else out going forward
> in order to use s3a://.
>
>
> On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin <[hidden email]> wrote:
>>
>> I have personally never tried to include hadoop-aws that way. But at
>> the very least, I'd try to use the same version of Hadoop as the Spark
>> build (2.7.3 IIRC). I don't really expect a different version to work,
>> and if it did in the past it definitely was not by design.
>>
>> On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas
>> <[hidden email]> wrote:
>> > Building with -Phadoop-2.7 didn’t help, and if I remember correctly,
>> > building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0 release,
>> > so
>> > it appears something has changed since then.
>> >
>> > I wasn’t familiar with -Phadoop-cloud, but I can try that.
>> >
>> > My goal here is simply to confirm that this release of Spark works with
>> > hadoop-aws like past releases did, particularly for Flintrock users who
>> > use
>> > Spark with S3A.
>> >
>> > We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop builds
>> > with
>> > every Spark release. If the -hadoop2.7 release build won’t work with
>> > hadoop-aws anymore, are there plans to provide a new build type that
>> > will?
>> >
>> > Apologies if the question is poorly formed. I’m batting a bit outside my
>> > league here. Again, my goal is simply to confirm that I/my users still
>> > have
>> > a way to use s3a://. In the past, that way was simply to call pyspark
>> > --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very similar.
>> > If
>> > that will no longer work, I’m trying to confirm that the change of
>> > behavior
>> > is intentional or acceptable (as a review for the Spark project) and
>> > figure
>> > out what I need to change (as due diligence for Flintrock’s users).
>> >
>> > Nick
>> >
>> >
>> > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin <[hidden email]>
>> > wrote:
>> >>
>> >> Using the hadoop-aws package is probably going to be a little more
>> >> complicated than that. The best bet is to use a custom build of Spark
>> >> that includes it (use -Phadoop-cloud). Otherwise you're probably
>> >> looking at some nasty dependency issues, especially if you end up
>> >> mixing different versions of Hadoop.
>> >>
>> >> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
>> >> <[hidden email]> wrote:
>> >> > I was able to successfully launch a Spark cluster on EC2 at 2.3.1 RC4
>> >> > using
>> >> > Flintrock. However, trying to load the hadoop-aws package gave me
>> >> > some
>> >> > errors.
>> >> >
>> >> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
>> >> >
>> >> > <snipped>
>> >> >
>> >> > :: problems summary ::
>> >> > :::: WARNINGS
>> >> >                 [NOT FOUND  ]
>> >> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar
>> >> >                 [NOT FOUND  ]
>> >> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
>> >> >                 [NOT FOUND  ]
>> >> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
>> >> >                 [NOT FOUND  ]
>> >> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
>> >> >
>> >> > I’d guess I’m probably using the wrong version of hadoop-aws, but I
>> >> > called
>> >> > make-distribution.sh with -Phadoop-2.8 so I’m not sure what else to
>> >> > try.
>> >> >
>> >> > Any quick pointers?
>> >> >
>> >> > Nick
>> >> >
>> >> >
>> >> > On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin <[hidden email]>
>> >> > wrote:
>> >> >>
>> >> >> Starting with my own +1 (binding).
>> >> >>
>> >> >> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin <[hidden email]>
>> >> >> wrote:
>> >> >> > Please vote on releasing the following candidate as Apache Spark
>> >> >> > version
>> >> >> > 2.3.1.
>> >> >> >
>> >> >> > Given that I expect at least a few people to be busy with Spark
>> >> >> > Summit
>> >> >> > next
>> >> >> > week, I'm taking the liberty of setting an extended voting period.
>> >> >> > The
>> >> >> > vote
>> >> >> > will be open until Friday, June 8th, at 19:00 UTC (that's 12:00
>> >> >> > PDT).
>> >> >> >
>> >> >> > It passes with a majority of +1 votes, which must include at least
>> >> >> > 3
>> >> >> > +1
>> >> >> > votes
>> >> >> > from the PMC.
>> >> >> >
>> >> >> > [ ] +1 Release this package as Apache Spark 2.3.1
>> >> >> > [ ] -1 Do not release this package because ...
>> >> >> >
>> >> >> > To learn more about Apache Spark, please see
>> >> >> > http://spark.apache.org/
>> >> >> >
>> >> >> > The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
>> >> >> > https://github.com/apache/spark/tree/v2.3.1-rc4
>> >> >> >
>> >> >> > The release files, including signatures, digests, etc. can be
>> >> >> > found
>> >> >> > at:
>> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/
>> >> >> >
>> >> >> > Signatures used for Spark RCs can be found in this file:
>> >> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >> >> >
>> >> >> > The staging repository for this release can be found at:
>> >> >> >
>> >> >> >
>> >> >> > https://repository.apache.org/content/repositories/orgapachespark-1272/
>> >> >> >
>> >> >> > The documentation corresponding to this release can be found at:
>> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/
>> >> >> >
>> >> >> > The list of bug fixes going into 2.3.1 can be found at the
>> >> >> > following
>> >> >> > URL:
>> >> >> > https://issues.apache.org/jira/projects/SPARK/versions/12342432
>> >> >> >
>> >> >> > FAQ
>> >> >> >
>> >> >> > =========================
>> >> >> > How can I help test this release?
>> >> >> > =========================
>> >> >> >
>> >> >> > If you are a Spark user, you can help us test this release by
>> >> >> > taking
>> >> >> > an existing Spark workload and running on this release candidate,
>> >> >> > then
>> >> >> > reporting any regressions.
>> >> >> >
>> >> >> > If you're working in PySpark you can set up a virtual env and
>> >> >> > install
>> >> >> > the current RC and see if anything important breaks, in the
>> >> >> > Java/Scala
>> >> >> > you can add the staging repository to your projects resolvers and
>> >> >> > test
>> >> >> > with the RC (make sure to clean up the artifact cache before/after
>> >> >> > so
>> >> >> > you don't end up building with a out of date RC going forward).
>> >> >> >
>> >> >> > ===========================================
>> >> >> > What should happen to JIRA tickets still targeting 2.3.1?
>> >> >> > ===========================================
>> >> >> >
>> >> >> > The current list of open tickets targeted at 2.3.1 can be found
>> >> >> > at:
>> >> >> > https://s.apache.org/Q3Uo
>> >> >> >
>> >> >> > Committers should look at those and triage. Extremely important
>> >> >> > bug
>> >> >> > fixes, documentation, and API tweaks that impact compatibility
>> >> >> > should
>> >> >> > be worked on immediately. Everything else please retarget to an
>> >> >> > appropriate release.
>> >> >> >
>> >> >> > ==================
>> >> >> > But my bug isn't fixed?
>> >> >> > ==================
>> >> >> >
>> >> >> > In order to make timely releases, we will typically not hold the
>> >> >> > release unless the bug in question is a regression from the
>> >> >> > previous
>> >> >> > release. That being said, if there is something which is a
>> >> >> > regression
>> >> >> > that has not been correctly targeted please ping me or a committer
>> >> >> > to
>> >> >> > help target the issue.
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Marcelo
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Marcelo
>> >> >>
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe e-mail: [hidden email]
>> >> >>
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Marcelo
>>
>>
>>
>> --
>> Marcelo



--
Marcelo


Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Spark 2.3.1 (RC4)

John Zhuge-2
+1

On Sun, Jun 3, 2018 at 6:12 PM, Hyukjin Kwon <[hidden email]> wrote:
+1

2018년 6월 3일 (일) 오후 9:25, Ricardo Almeida <[hidden email]>님이 작성:
+1 (non-binding) 

On 3 June 2018 at 09:23, Dongjoon Hyun <[hidden email]> wrote:
+1

Bests,
Dongjoon.

On Sat, Jun 2, 2018 at 8:09 PM, Denny Lee <[hidden email]> wrote:
+1

On Sat, Jun 2, 2018 at 4:53 PM Nicholas Chammas <[hidden email]> wrote:
I'll give that a try, but I'll still have to figure out what to do if none of the release builds work with hadoop-aws, since Flintrock deploys Spark release builds to set up a cluster. Building Spark is slow, so we only do it if the user specifically requests a Spark version by git hash. (This is basically how spark-ec2 did things, too.)


On Sat, Jun 2, 2018 at 6:54 PM Marcelo Vanzin <[hidden email]> wrote:
If you're building your own Spark, definitely try the hadoop-cloud
profile. Then you don't even need to pull anything at runtime,
everything is already packaged with Spark.

On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas
<[hidden email]> wrote:
> pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work for me
> either (even building with -Phadoop-2.7). I guess I’ve been relying on an
> unsupported pattern and will need to figure something else out going forward
> in order to use s3a://.
>
>
> On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin <[hidden email]> wrote:
>>
>> I have personally never tried to include hadoop-aws that way. But at
>> the very least, I'd try to use the same version of Hadoop as the Spark
>> build (2.7.3 IIRC). I don't really expect a different version to work,
>> and if it did in the past it definitely was not by design.
>>
>> On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas
>> <[hidden email]> wrote:
>> > Building with -Phadoop-2.7 didn’t help, and if I remember correctly,
>> > building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0 release,
>> > so
>> > it appears something has changed since then.
>> >
>> > I wasn’t familiar with -Phadoop-cloud, but I can try that.
>> >
>> > My goal here is simply to confirm that this release of Spark works with
>> > hadoop-aws like past releases did, particularly for Flintrock users who
>> > use
>> > Spark with S3A.
>> >
>> > We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop builds
>> > with
>> > every Spark release. If the -hadoop2.7 release build won’t work with
>> > hadoop-aws anymore, are there plans to provide a new build type that
>> > will?
>> >
>> > Apologies if the question is poorly formed. I’m batting a bit outside my
>> > league here. Again, my goal is simply to confirm that I/my users still
>> > have
>> > a way to use s3a://. In the past, that way was simply to call pyspark
>> > --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very similar.
>> > If
>> > that will no longer work, I’m trying to confirm that the change of
>> > behavior
>> > is intentional or acceptable (as a review for the Spark project) and
>> > figure
>> > out what I need to change (as due diligence for Flintrock’s users).
>> >
>> > Nick
>> >
>> >
>> > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin <[hidden email]>
>> > wrote:
>> >>
>> >> Using the hadoop-aws package is probably going to be a little more
>> >> complicated than that. The best bet is to use a custom build of Spark
>> >> that includes it (use -Phadoop-cloud). Otherwise you're probably
>> >> looking at some nasty dependency issues, especially if you end up
>> >> mixing different versions of Hadoop.
>> >>
>> >> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
>> >> <[hidden email]> wrote:
>> >> > I was able to successfully launch a Spark cluster on EC2 at 2.3.1 RC4
>> >> > using
>> >> > Flintrock. However, trying to load the hadoop-aws package gave me
>> >> > some
>> >> > errors.
>> >> >
>> >> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
>> >> >
>> >> > <snipped>
>> >> >
>> >> > :: problems summary ::
>> >> > :::: WARNINGS
>> >> >                 [NOT FOUND  ]
>> >> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar
>> >> >                 [NOT FOUND  ]
>> >> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
>> >> >                 [NOT FOUND  ]
>> >> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
>> >> >                 [NOT FOUND  ]
>> >> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
>> >> >         ==== local-m2-cache: tried
>> >> >
>> >> >
>> >> >
>> >> > file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
>> >> >
>> >> > I’d guess I’m probably using the wrong version of hadoop-aws, but I
>> >> > called
>> >> > make-distribution.sh with -Phadoop-2.8 so I’m not sure what else to
>> >> > try.
>> >> >
>> >> > Any quick pointers?
>> >> >
>> >> > Nick
>> >> >
>> >> >
>> >> > On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin <[hidden email]>
>> >> > wrote:
>> >> >>
>> >> >> Starting with my own +1 (binding).
>> >> >>
>> >> >> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin <[hidden email]>
>> >> >> wrote:
>> >> >> > Please vote on releasing the following candidate as Apache Spark
>> >> >> > version
>> >> >> > 2.3.1.
>> >> >> >
>> >> >> > Given that I expect at least a few people to be busy with Spark
>> >> >> > Summit
>> >> >> > next
>> >> >> > week, I'm taking the liberty of setting an extended voting period.
>> >> >> > The
>> >> >> > vote
>> >> >> > will be open until Friday, June 8th, at 19:00 UTC (that's 12:00
>> >> >> > PDT).
>> >> >> >
>> >> >> > It passes with a majority of +1 votes, which must include at least
>> >> >> > 3
>> >> >> > +1
>> >> >> > votes
>> >> >> > from the PMC.
>> >> >> >
>> >> >> > [ ] +1 Release this package as Apache Spark 2.3.1
>> >> >> > [ ] -1 Do not release this package because ...
>> >> >> >
>> >> >> > To learn more about Apache Spark, please see
>> >> >> > http://spark.apache.org/
>> >> >> >
>> >> >> > The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
>> >> >> > https://github.com/apache/spark/tree/v2.3.1-rc4
>> >> >> >
>> >> >> > The release files, including signatures, digests, etc. can be
>> >> >> > found
>> >> >> > at:
>> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/
>> >> >> >
>> >> >> > Signatures used for Spark RCs can be found in this file:
>> >> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >> >> >
>> >> >> > The staging repository for this release can be found at:
>> >> >> >
>> >> >> >
>> >> >> > https://repository.apache.org/content/repositories/orgapachespark-1272/
>> >> >> >
>> >> >> > The documentation corresponding to this release can be found at:
>> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/
>> >> >> >
>> >> >> > The list of bug fixes going into 2.3.1 can be found at the
>> >> >> > following
>> >> >> > URL:
>> >> >> > https://issues.apache.org/jira/projects/SPARK/versions/12342432
>> >> >> >
>> >> >> > FAQ
>> >> >> >
>> >> >> > =========================
>> >> >> > How can I help test this release?
>> >> >> > =========================
>> >> >> >
>> >> >> > If you are a Spark user, you can help us test this release by
>> >> >> > taking
>> >> >> > an existing Spark workload and running on this release candidate,
>> >> >> > then
>> >> >> > reporting any regressions.
>> >> >> >
>> >> >> > If you're working in PySpark you can set up a virtual env and
>> >> >> > install
>> >> >> > the current RC and see if anything important breaks, in the
>> >> >> > Java/Scala
>> >> >> > you can add the staging repository to your projects resolvers and
>> >> >> > test
>> >> >> > with the RC (make sure to clean up the artifact cache before/after
>> >> >> > so
>> >> >> > you don't end up building with a out of date RC going forward).
>> >> >> >
>> >> >> > ===========================================
>> >> >> > What should happen to JIRA tickets still targeting 2.3.1?
>> >> >> > ===========================================
>> >> >> >
>> >> >> > The current list of open tickets targeted at 2.3.1 can be found
>> >> >> > at:
>> >> >> > https://s.apache.org/Q3Uo
>> >> >> >
>> >> >> > Committers should look at those and triage. Extremely important
>> >> >> > bug
>> >> >> > fixes, documentation, and API tweaks that impact compatibility
>> >> >> > should
>> >> >> > be worked on immediately. Everything else please retarget to an
>> >> >> > appropriate release.
>> >> >> >
>> >> >> > ==================
>> >> >> > But my bug isn't fixed?
>> >> >> > ==================
>> >> >> >
>> >> >> > In order to make timely releases, we will typically not hold the
>> >> >> > release unless the bug in question is a regression from the
>> >> >> > previous
>> >> >> > release. That being said, if there is something which is a
>> >> >> > regression
>> >> >> > that has not been correctly targeted please ping me or a committer
>> >> >> > to
>> >> >> > help target the issue.
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Marcelo
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Marcelo
>> >> >>
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe e-mail: [hidden email]
>> >> >>
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Marcelo
>>
>>
>>
>> --
>> Marcelo



--
Marcelo





--
John
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Spark 2.3.1 (RC4)

Mark Hamstra
In reply to this post by Marcelo Vanzin
+1

On Fri, Jun 1, 2018 at 3:29 PM Marcelo Vanzin <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.3.1.

Given that I expect at least a few people to be busy with Spark Summit next
week, I'm taking the liberty of setting an extended voting period. The vote
will be open until Friday, June 8th, at 19:00 UTC (that's 12:00 PDT).

It passes with a majority of +1 votes, which must include at least 3 +1 votes
from the PMC.

[ ] +1 Release this package as Apache Spark 2.3.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
https://github.com/apache/spark/tree/v2.3.1-rc4

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1272/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/

The list of bug fixes going into 2.3.1 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12342432

FAQ

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.3.1?
===========================================

The current list of open tickets targeted at 2.3.1 can be found at:
https://s.apache.org/Q3Uo

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Spark 2.3.1 (RC4)

Joseph Bradley
+1

On Mon, Jun 4, 2018 at 10:16 AM, Mark Hamstra <[hidden email]> wrote:
+1

On Fri, Jun 1, 2018 at 3:29 PM Marcelo Vanzin <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.3.1.

Given that I expect at least a few people to be busy with Spark Summit next
week, I'm taking the liberty of setting an extended voting period. The vote
will be open until Friday, June 8th, at 19:00 UTC (that's 12:00 PDT).

It passes with a majority of +1 votes, which must include at least 3 +1 votes
from the PMC.

[ ] +1 Release this package as Apache Spark 2.3.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
https://github.com/apache/spark/tree/v2.3.1-rc4

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1272/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/

The list of bug fixes going into 2.3.1 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12342432

FAQ

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.3.1?
===========================================

The current list of open tickets targeted at 2.3.1 can be found at:
https://s.apache.org/Q3Uo

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]




--

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

http://databricks.com

12