[VOTE] SPARK 2.4.0 (RC4)

classic Classic list List threaded Threaded
49 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[VOTE] SPARK 2.4.0 (RC4)

cloud0fan
Please vote on releasing the following candidate as Apache Spark version 2.4.0.

The vote is open until October 26 PST and passes if a majority +1 PMC votes are cast, with
a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.4.0-rc4 (commit e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):

The release files, including signatures, digests, etc. can be found at:

Signatures used for Spark RCs can be found in this file:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:

The list of bug fixes going into 2.4.0 can be found at the following URL:

FAQ

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.4.0?
===========================================

The current list of open tickets targeted at 2.4.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] SPARK 2.4.0 (RC4)

cloud0fan
Since GitHub and Jenkins are in a chaotic state, I didn't wait for a green Jenkins QA job for the RC4 commit. We should fail this RC if the Jenkins is broken (very unlikely).

I'm adding my own +1, all known blockers are resolved.

On Tue, Oct 23, 2018 at 1:42 AM Wenchen Fan <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.4.0.

The vote is open until October 26 PST and passes if a majority +1 PMC votes are cast, with
a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.4.0-rc4 (commit e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):

The release files, including signatures, digests, etc. can be found at:

Signatures used for Spark RCs can be found in this file:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:

The list of bug fixes going into 2.4.0 can be found at the following URL:

FAQ

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.4.0?
===========================================

The current list of open tickets targeted at 2.4.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] SPARK 2.4.0 (RC4)

Sean Owen-2
In reply to this post by cloud0fan
Provisionally looking good to me, but I had a few questions.

We have these open for 2.4, but I presume they aren't actually going
to be in 2.4 and should be untargeted:

SPARK-25507 Update documents for the new features in 2.4 release
SPARK-25179 Document the features that require Pyarrow 0.10
SPARK-25783 Spark shell fails because of jline incompatibility
SPARK-25347 Document image data source in doc site
SPARK-25584 Document libsvm data source in doc site
SPARK-25346 Document Spark builtin data sources
SPARK-24464 Unit tests for MLlib's Instrumentation
SPARK-23197 Flaky test: spark.streaming.ReceiverSuite."receiver_life_cycle"
SPARK-22809 pyspark is sensitive to imports with dots
SPARK-21030 extend hint syntax to support any expression for Python and R

Comments in several of the doc issues suggest they are needed for 2.4
though. How essential?

(Brief digression: SPARK-21030 is an example of a pattern I see
sometimes. Parent Epic A is targeted for version X. Children B and C
are not. Epic A's description is basically "do X and Y". Is the parent
helping? And now that Y is done, is there a point in tracking X with
two JIRAs? can I just close the Epic?)

I am not sure I've tried running K8S in my test runs before, but I get
this on my Linux machine:

[INFO] --- exec-maven-plugin:1.4.0:exec (setup-integration-test-env) @
spark-kubernetes-integration-tests_2.12 ---
fatal: not a git repository (or any of the parent directories): .git
tar (child): --strip-components=1: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
scripts/setup-integration-test-env.sh: line 85:
/home/srowen/spark-2.4.0/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked/bin/docker-image-tool.sh:
No such file or directory
/home/srowen/spark-2.4.0/resource-managers/kubernetes/integration-tests
[INFO]
[INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @
spark-kubernetes-integration-tests_2.12 ---
Discovery starting.
Discovery completed in 289 milliseconds.
Run starting. Expected test count is: 14
KubernetesSuite:
org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite *** ABORTED ***
  java.lang.NullPointerException:
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(KubernetesSuite.scala:92)
  at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
  at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
  at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.org$scalatest$BeforeAndAfter$$super$run(KubernetesSuite.scala:39)
  at org.scalatest.BeforeAndAfter.run(BeforeAndAfter.scala:258)
  at org.scalatest.BeforeAndAfter.run$(BeforeAndAfter.scala:256)
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.run(KubernetesSuite.scala:39)
  at org.scalatest.Suite.callExecuteOnSuite$1(Suite.scala:1210)
  at org.scalatest.Suite.$anonfun$runNestedSuites$1(Suite.scala:1257)
  ...

Clearly it's expecting something about the env that isn't true, but I
don't know if it's a problem with those expectations versus what is in
the source release, or, just something to do with my env. This is with
Scala 2.12.



On Mon, Oct 22, 2018 at 12:42 PM Wenchen Fan <[hidden email]> wrote:

>
> Please vote on releasing the following candidate as Apache Spark version 2.4.0.
>
> The vote is open until October 26 PST and passes if a majority +1 PMC votes are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.4.0-rc4 (commit e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):
> https://github.com/apache/spark/tree/v2.4.0-rc4
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1290
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-docs/
>
> The list of bug fixes going into 2.4.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342385
>
> FAQ
>
> =========================
> How can I help test this release?
> =========================
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===========================================
> What should happen to JIRA tickets still targeting 2.4.0?
> ===========================================
>
> The current list of open tickets targeted at 2.4.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==================
> But my bug isn't fixed?
> ==================
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] SPARK 2.4.0 (RC4)

Stavros Kontopoulos-3
 
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
scripts/setup-integration-test-env.sh: line 85:
/home/srowen/spark-2.4.0/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked/bin/docker-image-tool.sh:

It seems you are missing the distro file... here is how I run it locally:

DOCKER_USERNAME=...
SPARK_K8S_IMAGE_TAG=...

./dev/make-distribution.sh --name test --tgz -Phadoop-2.7 -Pkubernetes -Phive
tar -zxvf spark-2.4.0-SNAPSHOT-bin-test.tgz
cd spark-2.4.0-SNAPSHOT-bin-test
./bin/docker-image-tool.sh -r $DOCKER_USERNAME -t $SPARK_K8S_IMAGE_TAG build
cd ..
TGZ_PATH=$(pwd)/spark-2.4.0-SNAPSHOT-bin-test.tgz
cd resource-managers/kubernetes/integration-tests
./dev/dev-run-integration-tests.sh --image-tag $SPARK_K8S_IMAGE_TAG --spark-tgz $TGZ_PATH --image-repo $DOCKER_USERNAME

Stavros

On Tue, Oct 23, 2018 at 1:54 AM, Sean Owen <[hidden email]> wrote:
Provisionally looking good to me, but I had a few questions.

We have these open for 2.4, but I presume they aren't actually going
to be in 2.4 and should be untargeted:

SPARK-25507 Update documents for the new features in 2.4 release
SPARK-25179 Document the features that require Pyarrow 0.10
SPARK-25783 Spark shell fails because of jline incompatibility
SPARK-25347 Document image data source in doc site
SPARK-25584 Document libsvm data source in doc site
SPARK-25346 Document Spark builtin data sources
SPARK-24464 Unit tests for MLlib's Instrumentation
SPARK-23197 Flaky test: spark.streaming.ReceiverSuite."receiver_life_cycle"
SPARK-22809 pyspark is sensitive to imports with dots
SPARK-21030 extend hint syntax to support any expression for Python and R

Comments in several of the doc issues suggest they are needed for 2.4
though. How essential?

(Brief digression: SPARK-21030 is an example of a pattern I see
sometimes. Parent Epic A is targeted for version X. Children B and C
are not. Epic A's description is basically "do X and Y". Is the parent
helping? And now that Y is done, is there a point in tracking X with
two JIRAs? can I just close the Epic?)

I am not sure I've tried running K8S in my test runs before, but I get
this on my Linux machine:

[INFO] --- exec-maven-plugin:1.4.0:exec (setup-integration-test-env) @
spark-kubernetes-integration-tests_2.12 ---
fatal: not a git repository (or any of the parent directories): .git
tar (child): --strip-components=1: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
scripts/setup-integration-test-env.sh: line 85:
/home/srowen/spark-2.4.0/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked/bin/docker-image-tool.sh:
No such file or directory
/home/srowen/spark-2.4.0/resource-managers/kubernetes/integration-tests
[INFO]
[INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @
spark-kubernetes-integration-tests_2.12 ---
Discovery starting.
Discovery completed in 289 milliseconds.
Run starting. Expected test count is: 14
KubernetesSuite:
org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite *** ABORTED ***
  java.lang.NullPointerException:
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(KubernetesSuite.scala:92)
  at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
  at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
  at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.org$scalatest$BeforeAndAfter$$super$run(KubernetesSuite.scala:39)
  at org.scalatest.BeforeAndAfter.run(BeforeAndAfter.scala:258)
  at org.scalatest.BeforeAndAfter.run$(BeforeAndAfter.scala:256)
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.run(KubernetesSuite.scala:39)
  at org.scalatest.Suite.callExecuteOnSuite$1(Suite.scala:1210)
  at org.scalatest.Suite.$anonfun$runNestedSuites$1(Suite.scala:1257)
  ...

Clearly it's expecting something about the env that isn't true, but I
don't know if it's a problem with those expectations versus what is in
the source release, or, just something to do with my env. This is with
Scala 2.12.



On Mon, Oct 22, 2018 at 12:42 PM Wenchen Fan <[hidden email]> wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 2.4.0.
>
> The vote is open until October 26 PST and passes if a majority +1 PMC votes are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.4.0-rc4 (commit e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):
> https://github.com/apache/spark/tree/v2.4.0-rc4
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1290
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-docs/
>
> The list of bug fixes going into 2.4.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342385
>
> FAQ
>
> =========================
> How can I help test this release?
> =========================
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===========================================
> What should happen to JIRA tickets still targeting 2.4.0?
> ===========================================
>
> The current list of open tickets targeted at 2.4.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==================
> But my bug isn't fixed?
> ==================
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]




Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] SPARK 2.4.0 (RC4)

Sean Owen-2
This is what I got from a straightforward build of the source distro
here ... really, ideally, it builds as-is from source. You're saying
someone would have to first build a k8s distro from source too?
It's not a 'must' that this be automatic but nothing else fails out of the box.
I feel like I might be misunderstanding the setup here.
On Mon, Oct 22, 2018 at 7:25 PM Stavros Kontopoulos
<[hidden email]> wrote:

>
>
>>
>> tar (child): Error is not recoverable: exiting now
>> tar: Child returned status 2
>> tar: Error is not recoverable: exiting now
>> scripts/setup-integration-test-env.sh: line 85:
>> /home/srowen/spark-2.4.0/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked/bin/docker-image-tool.sh:
>
>
> It seems you are missing the distro file... here is how I run it locally:
>
> DOCKER_USERNAME=...
> SPARK_K8S_IMAGE_TAG=...
>
> ./dev/make-distribution.sh --name test --tgz -Phadoop-2.7 -Pkubernetes -Phive
> tar -zxvf spark-2.4.0-SNAPSHOT-bin-test.tgz
> cd spark-2.4.0-SNAPSHOT-bin-test
> ./bin/docker-image-tool.sh -r $DOCKER_USERNAME -t $SPARK_K8S_IMAGE_TAG build
> cd ..
> TGZ_PATH=$(pwd)/spark-2.4.0-SNAPSHOT-bin-test.tgz
> cd resource-managers/kubernetes/integration-tests
> ./dev/dev-run-integration-tests.sh --image-tag $SPARK_K8S_IMAGE_TAG --spark-tgz $TGZ_PATH --image-repo $DOCKER_USERNAME
>
> Stavros
>
> On Tue, Oct 23, 2018 at 1:54 AM, Sean Owen <[hidden email]> wrote:
>>
>> Provisionally looking good to me, but I had a few questions.
>>
>> We have these open for 2.4, but I presume they aren't actually going
>> to be in 2.4 and should be untargeted:
>>
>> SPARK-25507 Update documents for the new features in 2.4 release
>> SPARK-25179 Document the features that require Pyarrow 0.10
>> SPARK-25783 Spark shell fails because of jline incompatibility
>> SPARK-25347 Document image data source in doc site
>> SPARK-25584 Document libsvm data source in doc site
>> SPARK-25346 Document Spark builtin data sources
>> SPARK-24464 Unit tests for MLlib's Instrumentation
>> SPARK-23197 Flaky test: spark.streaming.ReceiverSuite."receiver_life_cycle"
>> SPARK-22809 pyspark is sensitive to imports with dots
>> SPARK-21030 extend hint syntax to support any expression for Python and R
>>
>> Comments in several of the doc issues suggest they are needed for 2.4
>> though. How essential?
>>
>> (Brief digression: SPARK-21030 is an example of a pattern I see
>> sometimes. Parent Epic A is targeted for version X. Children B and C
>> are not. Epic A's description is basically "do X and Y". Is the parent
>> helping? And now that Y is done, is there a point in tracking X with
>> two JIRAs? can I just close the Epic?)
>>
>> I am not sure I've tried running K8S in my test runs before, but I get
>> this on my Linux machine:
>>
>> [INFO] --- exec-maven-plugin:1.4.0:exec (setup-integration-test-env) @
>> spark-kubernetes-integration-tests_2.12 ---
>> fatal: not a git repository (or any of the parent directories): .git
>> tar (child): --strip-components=1: Cannot open: No such file or directory
>> tar (child): Error is not recoverable: exiting now
>> tar: Child returned status 2
>> tar: Error is not recoverable: exiting now
>> scripts/setup-integration-test-env.sh: line 85:
>> /home/srowen/spark-2.4.0/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked/bin/docker-image-tool.sh:
>> No such file or directory
>> /home/srowen/spark-2.4.0/resource-managers/kubernetes/integration-tests
>> [INFO]
>> [INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @
>> spark-kubernetes-integration-tests_2.12 ---
>> Discovery starting.
>> Discovery completed in 289 milliseconds.
>> Run starting. Expected test count is: 14
>> KubernetesSuite:
>> org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite *** ABORTED ***
>>   java.lang.NullPointerException:
>>   at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(KubernetesSuite.scala:92)
>>   at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>>   at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.org$scalatest$BeforeAndAfter$$super$run(KubernetesSuite.scala:39)
>>   at org.scalatest.BeforeAndAfter.run(BeforeAndAfter.scala:258)
>>   at org.scalatest.BeforeAndAfter.run$(BeforeAndAfter.scala:256)
>>   at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.run(KubernetesSuite.scala:39)
>>   at org.scalatest.Suite.callExecuteOnSuite$1(Suite.scala:1210)
>>   at org.scalatest.Suite.$anonfun$runNestedSuites$1(Suite.scala:1257)
>>   ...
>>
>> Clearly it's expecting something about the env that isn't true, but I
>> don't know if it's a problem with those expectations versus what is in
>> the source release, or, just something to do with my env. This is with
>> Scala 2.12.
>>
>>
>>
>> On Mon, Oct 22, 2018 at 12:42 PM Wenchen Fan <[hidden email]> wrote:
>> >
>> > Please vote on releasing the following candidate as Apache Spark version 2.4.0.
>> >
>> > The vote is open until October 26 PST and passes if a majority +1 PMC votes are cast, with
>> > a minimum of 3 +1 votes.
>> >
>> > [ ] +1 Release this package as Apache Spark 2.4.0
>> > [ ] -1 Do not release this package because ...
>> >
>> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >
>> > The tag to be voted on is v2.4.0-rc4 (commit e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):
>> > https://github.com/apache/spark/tree/v2.4.0-rc4
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/
>> >
>> > Signatures used for Spark RCs can be found in this file:
>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1290
>> >
>> > The documentation corresponding to this release can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-docs/
>> >
>> > The list of bug fixes going into 2.4.0 can be found at the following URL:
>> > https://issues.apache.org/jira/projects/SPARK/versions/12342385
>> >
>> > FAQ
>> >
>> > =========================
>> > How can I help test this release?
>> > =========================
>> >
>> > If you are a Spark user, you can help us test this release by taking
>> > an existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > If you're working in PySpark you can set up a virtual env and install
>> > the current RC and see if anything important breaks, in the Java/Scala
>> > you can add the staging repository to your projects resolvers and test
>> > with the RC (make sure to clean up the artifact cache before/after so
>> > you don't end up building with a out of date RC going forward).
>> >
>> > ===========================================
>> > What should happen to JIRA tickets still targeting 2.4.0?
>> > ===========================================
>> >
>> > The current list of open tickets targeted at 2.4.0 can be found at:
>> > https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.0
>> >
>> > Committers should look at those and triage. Extremely important bug
>> > fixes, documentation, and API tweaks that impact compatibility should
>> > be worked on immediately. Everything else please retarget to an
>> > appropriate release.
>> >
>> > ==================
>> > But my bug isn't fixed?
>> > ==================
>> >
>> > In order to make timely releases, we will typically not hold the
>> > release unless the bug in question is a regression from the previous
>> > release. That being said, if there is something which is a regression
>> > that has not been correctly targeted please ping me or a committer to
>> > help target the issue.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [hidden email]
>>
>
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] SPARK 2.4.0 (RC4)

cloud0fan
Regarding the doc tickets, I vaguely remember that we can merge doc PRs after release and publish doc to spark website later. Can anyone confirm?

On Tue, Oct 23, 2018 at 8:30 AM Sean Owen <[hidden email]> wrote:
This is what I got from a straightforward build of the source distro
here ... really, ideally, it builds as-is from source. You're saying
someone would have to first build a k8s distro from source too?
It's not a 'must' that this be automatic but nothing else fails out of the box.
I feel like I might be misunderstanding the setup here.
On Mon, Oct 22, 2018 at 7:25 PM Stavros Kontopoulos
<[hidden email]> wrote:
>
>
>>
>> tar (child): Error is not recoverable: exiting now
>> tar: Child returned status 2
>> tar: Error is not recoverable: exiting now
>> scripts/setup-integration-test-env.sh: line 85:
>> /home/srowen/spark-2.4.0/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked/bin/docker-image-tool.sh:
>
>
> It seems you are missing the distro file... here is how I run it locally:
>
> DOCKER_USERNAME=...
> SPARK_K8S_IMAGE_TAG=...
>
> ./dev/make-distribution.sh --name test --tgz -Phadoop-2.7 -Pkubernetes -Phive
> tar -zxvf spark-2.4.0-SNAPSHOT-bin-test.tgz
> cd spark-2.4.0-SNAPSHOT-bin-test
> ./bin/docker-image-tool.sh -r $DOCKER_USERNAME -t $SPARK_K8S_IMAGE_TAG build
> cd ..
> TGZ_PATH=$(pwd)/spark-2.4.0-SNAPSHOT-bin-test.tgz
> cd resource-managers/kubernetes/integration-tests
> ./dev/dev-run-integration-tests.sh --image-tag $SPARK_K8S_IMAGE_TAG --spark-tgz $TGZ_PATH --image-repo $DOCKER_USERNAME
>
> Stavros
>
> On Tue, Oct 23, 2018 at 1:54 AM, Sean Owen <[hidden email]> wrote:
>>
>> Provisionally looking good to me, but I had a few questions.
>>
>> We have these open for 2.4, but I presume they aren't actually going
>> to be in 2.4 and should be untargeted:
>>
>> SPARK-25507 Update documents for the new features in 2.4 release
>> SPARK-25179 Document the features that require Pyarrow 0.10
>> SPARK-25783 Spark shell fails because of jline incompatibility
>> SPARK-25347 Document image data source in doc site
>> SPARK-25584 Document libsvm data source in doc site
>> SPARK-25346 Document Spark builtin data sources
>> SPARK-24464 Unit tests for MLlib's Instrumentation
>> SPARK-23197 Flaky test: spark.streaming.ReceiverSuite."receiver_life_cycle"
>> SPARK-22809 pyspark is sensitive to imports with dots
>> SPARK-21030 extend hint syntax to support any expression for Python and R
>>
>> Comments in several of the doc issues suggest they are needed for 2.4
>> though. How essential?
>>
>> (Brief digression: SPARK-21030 is an example of a pattern I see
>> sometimes. Parent Epic A is targeted for version X. Children B and C
>> are not. Epic A's description is basically "do X and Y". Is the parent
>> helping? And now that Y is done, is there a point in tracking X with
>> two JIRAs? can I just close the Epic?)
>>
>> I am not sure I've tried running K8S in my test runs before, but I get
>> this on my Linux machine:
>>
>> [INFO] --- exec-maven-plugin:1.4.0:exec (setup-integration-test-env) @
>> spark-kubernetes-integration-tests_2.12 ---
>> fatal: not a git repository (or any of the parent directories): .git
>> tar (child): --strip-components=1: Cannot open: No such file or directory
>> tar (child): Error is not recoverable: exiting now
>> tar: Child returned status 2
>> tar: Error is not recoverable: exiting now
>> scripts/setup-integration-test-env.sh: line 85:
>> /home/srowen/spark-2.4.0/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked/bin/docker-image-tool.sh:
>> No such file or directory
>> /home/srowen/spark-2.4.0/resource-managers/kubernetes/integration-tests
>> [INFO]
>> [INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @
>> spark-kubernetes-integration-tests_2.12 ---
>> Discovery starting.
>> Discovery completed in 289 milliseconds.
>> Run starting. Expected test count is: 14
>> KubernetesSuite:
>> org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite *** ABORTED ***
>>   java.lang.NullPointerException:
>>   at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(KubernetesSuite.scala:92)
>>   at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>>   at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.org$scalatest$BeforeAndAfter$$super$run(KubernetesSuite.scala:39)
>>   at org.scalatest.BeforeAndAfter.run(BeforeAndAfter.scala:258)
>>   at org.scalatest.BeforeAndAfter.run$(BeforeAndAfter.scala:256)
>>   at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.run(KubernetesSuite.scala:39)
>>   at org.scalatest.Suite.callExecuteOnSuite$1(Suite.scala:1210)
>>   at org.scalatest.Suite.$anonfun$runNestedSuites$1(Suite.scala:1257)
>>   ...
>>
>> Clearly it's expecting something about the env that isn't true, but I
>> don't know if it's a problem with those expectations versus what is in
>> the source release, or, just something to do with my env. This is with
>> Scala 2.12.
>>
>>
>>
>> On Mon, Oct 22, 2018 at 12:42 PM Wenchen Fan <[hidden email]> wrote:
>> >
>> > Please vote on releasing the following candidate as Apache Spark version 2.4.0.
>> >
>> > The vote is open until October 26 PST and passes if a majority +1 PMC votes are cast, with
>> > a minimum of 3 +1 votes.
>> >
>> > [ ] +1 Release this package as Apache Spark 2.4.0
>> > [ ] -1 Do not release this package because ...
>> >
>> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >
>> > The tag to be voted on is v2.4.0-rc4 (commit e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):
>> > https://github.com/apache/spark/tree/v2.4.0-rc4
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/
>> >
>> > Signatures used for Spark RCs can be found in this file:
>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1290
>> >
>> > The documentation corresponding to this release can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-docs/
>> >
>> > The list of bug fixes going into 2.4.0 can be found at the following URL:
>> > https://issues.apache.org/jira/projects/SPARK/versions/12342385
>> >
>> > FAQ
>> >
>> > =========================
>> > How can I help test this release?
>> > =========================
>> >
>> > If you are a Spark user, you can help us test this release by taking
>> > an existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > If you're working in PySpark you can set up a virtual env and install
>> > the current RC and see if anything important breaks, in the Java/Scala
>> > you can add the staging repository to your projects resolvers and test
>> > with the RC (make sure to clean up the artifact cache before/after so
>> > you don't end up building with a out of date RC going forward).
>> >
>> > ===========================================
>> > What should happen to JIRA tickets still targeting 2.4.0?
>> > ===========================================
>> >
>> > The current list of open tickets targeted at 2.4.0 can be found at:
>> > https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.0
>> >
>> > Committers should look at those and triage. Extremely important bug
>> > fixes, documentation, and API tweaks that impact compatibility should
>> > be worked on immediately. Everything else please retarget to an
>> > appropriate release.
>> >
>> > ==================
>> > But my bug isn't fixed?
>> > ==================
>> >
>> > In order to make timely releases, we will typically not hold the
>> > release unless the bug in question is a regression from the previous
>> > release. That being said, if there is something which is a regression
>> > that has not been correctly targeted please ping me or a committer to
>> > help target the issue.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [hidden email]
>>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] SPARK 2.4.0 (RC4)

Imran Rashid-4
In reply to this post by cloud0fan
+1
No blockers and our internal tests are all passing.

(I did file https://issues.apache.org/jira/browse/SPARK-25805, but this is just a minor issue with a flaky test)

On Mon, Oct 22, 2018 at 12:42 PM Wenchen Fan <[hidden email]> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.4.0.

The vote is open until October 26 PST and passes if a majority +1 PMC votes are cast, with
a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.4.0-rc4 (commit e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):

The release files, including signatures, digests, etc. can be found at:

Signatures used for Spark RCs can be found in this file:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:

The list of bug fixes going into 2.4.0 can be found at the following URL:

FAQ

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.4.0?
===========================================

The current list of open tickets targeted at 2.4.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] SPARK 2.4.0 (RC4)

Aron.tao
In reply to this post by cloud0fan
+1



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] SPARK 2.4.0 (RC4)

Darcy Shen-2
In reply to this post by cloud0fan


+1


---- On Tue, 23 Oct 2018 01:42:06 +0800 Wenchen Fan<[hidden email]> wrote ----

Please vote on releasing the following candidate as Apache Spark version 2.4.0.

The vote is open until October 26 PST and passes if a majority +1 PMC votes are cast, with
a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.4.0-rc4 (commit e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):

The release files, including signatures, digests, etc. can be found at:

Signatures used for Spark RCs can be found in this file:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:

The list of bug fixes going into 2.4.0 can be found at the following URL:

FAQ

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.4.0?
===========================================

The current list of open tickets targeted at 2.4.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] SPARK 2.4.0 (RC4)

Hyukjin Kwon
I am sorry for raising this late. Out of curiosity, does anyone know why we don't treat SPARK-24935 (https://github.com/apache/spark/pull/22144) as a blocker?

It looks it broke a API compatibility, and an actual usecase of an external library (https://github.com/DataSketches/sketches-hive)


2018년 10월 23일 (화) 오후 12:03, Darcy Shen <[hidden email]>님이 작성:


+1


---- On Tue, 23 Oct 2018 01:42:06 +0800 Wenchen Fan<[hidden email]> wrote ----

Please vote on releasing the following candidate as Apache Spark version 2.4.0.

The vote is open until October 26 PST and passes if a majority +1 PMC votes are cast, with
a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.4.0-rc4 (commit e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):

The release files, including signatures, digests, etc. can be found at:

Signatures used for Spark RCs can be found in this file:

The staging repository for this release can be found at:

The documentation corresponding to this release can be found at:

The list of bug fixes going into 2.4.0 can be found at the following URL:

FAQ

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.4.0?
===========================================

The current list of open tickets targeted at 2.4.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] SPARK 2.4.0 (RC4)

Sean Owen-2
In reply to this post by cloud0fan
No, because the docs are built into the release too and released to
the site too from the released artifact.
As a practical matter, I think these docs are not critical for
release, and can follow in a maintenance release. I'd retarget to
2.4.1 or untarget.
I do know at times a release's docs have been edited after the fact,
but that's bad form. We'd not go change a class in the release after
it was released and call it the same release.

I'd still like some confirmation that someone can build and pass tests
with -Pkubernetes, maybe? It actually all passed with the 2.11 build.
I don't think it's a 2.12 incompatibility, but rather than the K8S
tests maybe don't quite work with the 2.12 build artifact naming. Or
else something to do with my env.

On Mon, Oct 22, 2018 at 9:08 PM Wenchen Fan <[hidden email]> wrote:

>
> Regarding the doc tickets, I vaguely remember that we can merge doc PRs after release and publish doc to spark website later. Can anyone confirm?
>
> On Tue, Oct 23, 2018 at 8:30 AM Sean Owen <[hidden email]> wrote:
>>
>> This is what I got from a straightforward build of the source distro
>> here ... really, ideally, it builds as-is from source. You're saying
>> someone would have to first build a k8s distro from source too?
>> It's not a 'must' that this be automatic but nothing else fails out of the box.
>> I feel like I might be misunderstanding the setup here.
>> On Mon, Oct 22, 2018 at 7:25 PM Stavros Kontopoulos
>> <[hidden email]> wrote:

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] SPARK 2.4.0 (RC4)

Stavros Kontopoulos-3
Sean, 

I will try it against 2.12 shortly. 

You're saying someone would have to first build a k8s distro from source too?

Ok I missed the error one line above, before the distro error there is another one:

fatal: not a git repository (or any of the parent directories): .git

So that seems to come from here. It seems that the test root is not set up correctly. It should be the top git dir from which you built Spark.

Now regarding the distro thing. dev-run-integration-tests.sh should run from within the cloned project after the distro is built. The distro is required, it should fail otherwise.

Integration tests run the setup-integration-test-env.sh script. dev-run-integration-tests.sh calls mvn which in turn executes that setup script

How do you run the tests?

Stavros

On Tue, Oct 23, 2018 at 3:01 PM, Sean Owen <[hidden email]> wrote:
No, because the docs are built into the release too and released to
the site too from the released artifact.
As a practical matter, I think these docs are not critical for
release, and can follow in a maintenance release. I'd retarget to
2.4.1 or untarget.
I do know at times a release's docs have been edited after the fact,
but that's bad form. We'd not go change a class in the release after
it was released and call it the same release.

I'd still like some confirmation that someone can build and pass tests
with -Pkubernetes, maybe? It actually all passed with the 2.11 build.
I don't think it's a 2.12 incompatibility, but rather than the K8S
tests maybe don't quite work with the 2.12 build artifact naming. Or
else something to do with my env.

On Mon, Oct 22, 2018 at 9:08 PM Wenchen Fan <[hidden email]> wrote:
>
> Regarding the doc tickets, I vaguely remember that we can merge doc PRs after release and publish doc to spark website later. Can anyone confirm?
>
> On Tue, Oct 23, 2018 at 8:30 AM Sean Owen <[hidden email]> wrote:
>>
>> This is what I got from a straightforward build of the source distro
>> here ... really, ideally, it builds as-is from source. You're saying
>> someone would have to first build a k8s distro from source too?
>> It's not a 'must' that this be automatic but nothing else fails out of the box.
>> I feel like I might be misunderstanding the setup here.
>> On Mon, Oct 22, 2018 at 7:25 PM Stavros Kontopoulos
>> <[hidden email]> wrote:



--
Stavros Kontopoulos
Senior Software Engineer
Lightbend, Inc.
<a href="tel:%2B1%20650%20678%200020" value="+16506780020" target="_blank">p:  +30 6977967274
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] SPARK 2.4.0 (RC4)

Hyukjin Kwon
I am searching and checking some PRs or JIRAs that state regression. Let me leave a link - it might be good to double check https://github.com/apache/spark/pull/22514 as well.

2018년 10월 23일 (화) 오후 11:58, Stavros Kontopoulos <[hidden email]>님이 작성:
Sean, 

I will try it against 2.12 shortly. 

You're saying someone would have to first build a k8s distro from source too?

Ok I missed the error one line above, before the distro error there is another one:

fatal: not a git repository (or any of the parent directories): .git

So that seems to come from here. It seems that the test root is not set up correctly. It should be the top git dir from which you built Spark.

Now regarding the distro thing. dev-run-integration-tests.sh should run from within the cloned project after the distro is built. The distro is required, it should fail otherwise.

Integration tests run the setup-integration-test-env.sh script. dev-run-integration-tests.sh calls mvn which in turn executes that setup script

How do you run the tests?

Stavros

On Tue, Oct 23, 2018 at 3:01 PM, Sean Owen <[hidden email]> wrote:
No, because the docs are built into the release too and released to
the site too from the released artifact.
As a practical matter, I think these docs are not critical for
release, and can follow in a maintenance release. I'd retarget to
2.4.1 or untarget.
I do know at times a release's docs have been edited after the fact,
but that's bad form. We'd not go change a class in the release after
it was released and call it the same release.

I'd still like some confirmation that someone can build and pass tests
with -Pkubernetes, maybe? It actually all passed with the 2.11 build.
I don't think it's a 2.12 incompatibility, but rather than the K8S
tests maybe don't quite work with the 2.12 build artifact naming. Or
else something to do with my env.

On Mon, Oct 22, 2018 at 9:08 PM Wenchen Fan <[hidden email]> wrote:
>
> Regarding the doc tickets, I vaguely remember that we can merge doc PRs after release and publish doc to spark website later. Can anyone confirm?
>
> On Tue, Oct 23, 2018 at 8:30 AM Sean Owen <[hidden email]> wrote:
>>
>> This is what I got from a straightforward build of the source distro
>> here ... really, ideally, it builds as-is from source. You're saying
>> someone would have to first build a k8s distro from source too?
>> It's not a 'must' that this be automatic but nothing else fails out of the box.
>> I feel like I might be misunderstanding the setup here.
>> On Mon, Oct 22, 2018 at 7:25 PM Stavros Kontopoulos
>> <[hidden email]> wrote:



--
Stavros Kontopoulos
Senior Software Engineer
Lightbend, Inc.
<a href="tel:%2B1%20650%20678%200020" value="+16506780020" target="_blank">p:  +30 6977967274
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] SPARK 2.4.0 (RC4)

Sean Owen-2
In reply to this post by Stavros Kontopoulos-3
Yeah, that's maybe the issue here. This is a source release, not a git checkout, and it still needs to work in this context.

I just added -Pkubernetes to my build and didn't do anything else. I think the ideal is that a "mvn -P... -P... install" to work from a source release; that's a good expectation and consistent with docs.

Maybe these tests simply don't need to run with the normal suite of tests, and can be considered tests run manually by developers running these scripts? Basically, KubernetesSuite shouldn't run in a normal mvn install?

I don't think this has to block the release even if so, just trying to get to the bottom of it.


On Tue, Oct 23, 2018 at 10:58 AM Stavros Kontopoulos <[hidden email]> wrote:
Ok I missed the error one line above, before the distro error there is another one:

fatal: not a git repository (or any of the parent directories): .git

So that seems to come from here. It seems that the test root is not set up correctly. It should be the top git dir from which you built Spark.

Now regarding the distro thing. dev-run-integration-tests.sh should run from within the cloned project after the distro is built. The distro is required, it should fail otherwise.

Integration tests run the setup-integration-test-env.sh script. dev-run-integration-tests.sh calls mvn which in turn executes that setup script

How do you run the tests?

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] SPARK 2.4.0 (RC4)

cloud0fan
In reply to this post by Stavros Kontopoulos-3
I read through the contributing guide, it only mentions that data correctness and data loss issues should be marked as blockers. AFAIK we also mark regressions of current release as blockers, but not regressions of the previous releases.

SPARK-24935 is indeed a bug, and is a regression from Spark 2.2.0. We should definitely fix it, but doesn't seem like a blocker. BTW the root cause of SPARK-24935 is unknown(at least I can't tell from the PR), so fixing it might take a while.

On Tue, Oct 23, 2018 at 11:58 PM Stavros Kontopoulos <[hidden email]> wrote:
Sean, 

I will try it against 2.12 shortly. 

You're saying someone would have to first build a k8s distro from source too?

Ok I missed the error one line above, before the distro error there is another one:

fatal: not a git repository (or any of the parent directories): .git

So that seems to come from here. It seems that the test root is not set up correctly. It should be the top git dir from which you built Spark.

Now regarding the distro thing. dev-run-integration-tests.sh should run from within the cloned project after the distro is built. The distro is required, it should fail otherwise.

Integration tests run the setup-integration-test-env.sh script. dev-run-integration-tests.sh calls mvn which in turn executes that setup script

How do you run the tests?

Stavros

On Tue, Oct 23, 2018 at 3:01 PM, Sean Owen <[hidden email]> wrote:
No, because the docs are built into the release too and released to
the site too from the released artifact.
As a practical matter, I think these docs are not critical for
release, and can follow in a maintenance release. I'd retarget to
2.4.1 or untarget.
I do know at times a release's docs have been edited after the fact,
but that's bad form. We'd not go change a class in the release after
it was released and call it the same release.

I'd still like some confirmation that someone can build and pass tests
with -Pkubernetes, maybe? It actually all passed with the 2.11 build.
I don't think it's a 2.12 incompatibility, but rather than the K8S
tests maybe don't quite work with the 2.12 build artifact naming. Or
else something to do with my env.

On Mon, Oct 22, 2018 at 9:08 PM Wenchen Fan <[hidden email]> wrote:
>
> Regarding the doc tickets, I vaguely remember that we can merge doc PRs after release and publish doc to spark website later. Can anyone confirm?
>
> On Tue, Oct 23, 2018 at 8:30 AM Sean Owen <[hidden email]> wrote:
>>
>> This is what I got from a straightforward build of the source distro
>> here ... really, ideally, it builds as-is from source. You're saying
>> someone would have to first build a k8s distro from source too?
>> It's not a 'must' that this be automatic but nothing else fails out of the box.
>> I feel like I might be misunderstanding the setup here.
>> On Mon, Oct 22, 2018 at 7:25 PM Stavros Kontopoulos
>> <[hidden email]> wrote:



--
Stavros Kontopoulos
Senior Software Engineer
Lightbend, Inc.
<a href="tel:%2B1%20650%20678%200020" value="+16506780020" target="_blank">p:  +30 6977967274
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] SPARK 2.4.0 (RC4)

Sean Owen-2
In reply to this post by Sean Owen-2
(I should add, I only observed this with the Scala 2.12 build. It all
seemed to work with 2.11. Therefore I'm not too worried about it. I
don't think it's a Scala version issue, but perhaps something looking
for a spark 2.11 tarball and not finding it. See
https://github.com/apache/spark/pull/22805#issuecomment-432304622 for
a change that might address this kind of thing.)

On Tue, Oct 23, 2018 at 11:05 AM Sean Owen <[hidden email]> wrote:
>
> Yeah, that's maybe the issue here. This is a source release, not a git checkout, and it still needs to work in this context.
>
> I just added -Pkubernetes to my build and didn't do anything else. I think the ideal is that a "mvn -P... -P... install" to work from a source release; that's a good expectation and consistent with docs.
>
> Maybe these tests simply don't need to run with the normal suite of tests, and can be considered tests run manually by developers running these scripts? Basically, KubernetesSuite shouldn't run in a normal mvn install?
>
> I don't think this has to block the release even if so, just trying to get to the bottom of it.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] SPARK 2.4.0 (RC4)

Xiao Li
Thanks for reporting this. https://github.com/apache/spark/pull/22514 is not a blocker. We can fix it in the next minor release, if we are unable to make it in this release. 

Thanks, 

Xiao

Sean Owen <[hidden email]> 于2018年10月23日周二 上午9:14写道:
(I should add, I only observed this with the Scala 2.12 build. It all
seemed to work with 2.11. Therefore I'm not too worried about it. I
don't think it's a Scala version issue, but perhaps something looking
for a spark 2.11 tarball and not finding it. See
https://github.com/apache/spark/pull/22805#issuecomment-432304622 for
a change that might address this kind of thing.)

On Tue, Oct 23, 2018 at 11:05 AM Sean Owen <[hidden email]> wrote:
>
> Yeah, that's maybe the issue here. This is a source release, not a git checkout, and it still needs to work in this context.
>
> I just added -Pkubernetes to my build and didn't do anything else. I think the ideal is that a "mvn -P... -P... install" to work from a source release; that's a good expectation and consistent with docs.
>
> Maybe these tests simply don't need to run with the normal suite of tests, and can be considered tests run manually by developers running these scripts? Basically, KubernetesSuite shouldn't run in a normal mvn install?
>
> I don't think this has to block the release even if so, just trying to get to the bottom of it.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] SPARK 2.4.0 (RC4)

Xiao Li
https://github.com/apache/spark/pull/22144 is also not a blocker of Spark 2.4 release, as discussed in the PR. 

Thanks,

Xiao

Xiao Li <[hidden email]> 于2018年10月23日周二 上午9:20写道:
Thanks for reporting this. https://github.com/apache/spark/pull/22514 is not a blocker. We can fix it in the next minor release, if we are unable to make it in this release. 

Thanks, 

Xiao

Sean Owen <[hidden email]> 于2018年10月23日周二 上午9:14写道:
(I should add, I only observed this with the Scala 2.12 build. It all
seemed to work with 2.11. Therefore I'm not too worried about it. I
don't think it's a Scala version issue, but perhaps something looking
for a spark 2.11 tarball and not finding it. See
https://github.com/apache/spark/pull/22805#issuecomment-432304622 for
a change that might address this kind of thing.)

On Tue, Oct 23, 2018 at 11:05 AM Sean Owen <[hidden email]> wrote:
>
> Yeah, that's maybe the issue here. This is a source release, not a git checkout, and it still needs to work in this context.
>
> I just added -Pkubernetes to my build and didn't do anything else. I think the ideal is that a "mvn -P... -P... install" to work from a source release; that's a good expectation and consistent with docs.
>
> Maybe these tests simply don't need to run with the normal suite of tests, and can be considered tests run manually by developers running these scripts? Basically, KubernetesSuite shouldn't run in a normal mvn install?
>
> I don't think this has to block the release even if so, just trying to get to the bottom of it.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] SPARK 2.4.0 (RC4)

Hyukjin Kwon
https://github.com/apache/spark/pull/22514 sounds like a regression that affects Hive CTAS in write path (by not replacing them into Spark internal datasources; therefore performance regression). 
but yea I suspect if we should block the release by this.

https://github.com/apache/spark/pull/22144 is just being discussed if I am not mistaken.

Thanks.

2018년 10월 24일 (수) 오전 12:27, Xiao Li <[hidden email]>님이 작성:
https://github.com/apache/spark/pull/22144 is also not a blocker of Spark 2.4 release, as discussed in the PR. 

Thanks,

Xiao

Xiao Li <[hidden email]> 于2018年10月23日周二 上午9:20写道:
Thanks for reporting this. https://github.com/apache/spark/pull/22514 is not a blocker. We can fix it in the next minor release, if we are unable to make it in this release. 

Thanks, 

Xiao

Sean Owen <[hidden email]> 于2018年10月23日周二 上午9:14写道:
(I should add, I only observed this with the Scala 2.12 build. It all
seemed to work with 2.11. Therefore I'm not too worried about it. I
don't think it's a Scala version issue, but perhaps something looking
for a spark 2.11 tarball and not finding it. See
https://github.com/apache/spark/pull/22805#issuecomment-432304622 for
a change that might address this kind of thing.)

On Tue, Oct 23, 2018 at 11:05 AM Sean Owen <[hidden email]> wrote:
>
> Yeah, that's maybe the issue here. This is a source release, not a git checkout, and it still needs to work in this context.
>
> I just added -Pkubernetes to my build and didn't do anything else. I think the ideal is that a "mvn -P... -P... install" to work from a source release; that's a good expectation and consistent with docs.
>
> Maybe these tests simply don't need to run with the normal suite of tests, and can be considered tests run manually by developers running these scripts? Basically, KubernetesSuite shouldn't run in a normal mvn install?
>
> I don't think this has to block the release even if so, just trying to get to the bottom of it.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] SPARK 2.4.0 (RC4)

Stavros Kontopoulos-3
Sean,

Ok makes sense, im using a cloned repo. I built with Scala 2.12 profile using the related tag v2.4.0-rc4:

./dev/change-scala-version.sh 2.12
./dev/make-distribution.sh  --name test --r --tgz -Pscala-2.12 -Psparkr -Phadoop-2.7 -Pkubernetes -Phive
Pushed images to dockerhub (previous email) since I didnt use the minikube daemon (default behavior).

Then run tests successfully against minikube:

TGZ_PATH=$(pwd)/spark-2.4.0-bin-test.gz
cd resource-managers/kubernetes/integration-tests

./dev/dev-run-integration-tests.sh --spark-tgz $TGZ_PATH --service-account default --namespace default --image-tag k8s-scala-12 --image-repo skonto


[INFO] 
[INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @ spark-kubernetes-integration-tests_2.12 ---
Discovery starting.
Discovery completed in 229 milliseconds.
Run starting. Expected test count is: 14
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
Run completed in 5 minutes, 24 seconds.
Total number of tests run: 14
Suites: completed 2, aborted 0
Tests: succeeded 14, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Spark Project Parent POM 2.4.0 ..................... SUCCESS [  4.491 s]
[INFO] Spark Project Tags ................................. SUCCESS [  3.833 s]
[INFO] Spark Project Local DB ............................. SUCCESS [  2.680 s]
[INFO] Spark Project Networking ........................... SUCCESS [  4.817 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  2.541 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [  2.795 s]
[INFO] Spark Project Launcher ............................. SUCCESS [  5.593 s]
[INFO] Spark Project Core ................................. SUCCESS [ 25.160 s]
[INFO] Spark Project Kubernetes Integration Tests 2.4.0 ... SUCCESS [05:30 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 06:23 min
[INFO] Finished at: 2018-10-23T18:39:11Z
[INFO] ------------------------------------------------------------------------


but had to modify this line and added -Pscala-2.12 , otherwise it fails (these tests inherit from the parent pom but the profile is not propagated to the mvn command that launches the tests, I can create a PR to fix that).


On Tue, Oct 23, 2018 at 7:44 PM, Hyukjin Kwon <[hidden email]> wrote:
https://github.com/apache/spark/pull/22514 sounds like a regression that affects Hive CTAS in write path (by not replacing them into Spark internal datasources; therefore performance regression). 
but yea I suspect if we should block the release by this.

https://github.com/apache/spark/pull/22144 is just being discussed if I am not mistaken.

Thanks.

2018년 10월 24일 (수) 오전 12:27, Xiao Li <[hidden email]>님이 작성:
https://github.com/apache/spark/pull/22144 is also not a blocker of Spark 2.4 release, as discussed in the PR. 

Thanks,

Xiao

Xiao Li <[hidden email]> 于2018年10月23日周二 上午9:20写道:
Thanks for reporting this. https://github.com/apache/spark/pull/22514 is not a blocker. We can fix it in the next minor release, if we are unable to make it in this release. 

Thanks, 

Xiao

Sean Owen <[hidden email]> 于2018年10月23日周二 上午9:14写道:
(I should add, I only observed this with the Scala 2.12 build. It all
seemed to work with 2.11. Therefore I'm not too worried about it. I
don't think it's a Scala version issue, but perhaps something looking
for a spark 2.11 tarball and not finding it. See
https://github.com/apache/spark/pull/22805#issuecomment-432304622 for
a change that might address this kind of thing.)

On Tue, Oct 23, 2018 at 11:05 AM Sean Owen <[hidden email]> wrote:
>
> Yeah, that's maybe the issue here. This is a source release, not a git checkout, and it still needs to work in this context.
>
> I just added -Pkubernetes to my build and didn't do anything else. I think the ideal is that a "mvn -P... -P... install" to work from a source release; that's a good expectation and consistent with docs.
>
> Maybe these tests simply don't need to run with the normal suite of tests, and can be considered tests run manually by developers running these scripts? Basically, KubernetesSuite shouldn't run in a normal mvn install?
>
> I don't think this has to block the release even if so, just trying to get to the bottom of it.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]




--
Stavros Kontopoulos
Senior Software Engineer
Lightbend, Inc.
<a href="tel:%2B1%20650%20678%200020" value="+16506780020" target="_blank">p:  +30 6977967274
123