回复: [DISCUSS] Apache Spark 3.0.1 Release

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

回复: [DISCUSS] Apache Spark 3.0.1 Release

ruifengz
Hi all,
I am going to prepare the realease of 3.0.1 RC1, with the help of Wenchen.


------------------ 原始邮件 ------------------
发件人: "Jason Moore" <[hidden email]>;
发送时间: 2020年7月30日(星期四) 上午10:35
收件人: "dev"<[hidden email]>;
主题: Re: [DISCUSS] Apache Spark 3.0.1 Release

Hi all,

 

Discussion around 3.0.1 seems to have trickled away.  What was blocking the release process kicking off?  I can see some unresolved bugs raised against 3.0.0, but conversely there were quite a few critical correctness fixes waiting to be released.

 

Cheers,

Jason.

 

From: Takeshi Yamamuro <[hidden email]>
Date: Wednesday, 15 July 2020 at 9:00 am
To: Shivaram Venkataraman <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [DISCUSS] Apache Spark 3.0.1 Release

 

> Just wanted to check if there are any blockers that we are still waiting for to start the new release process.

I don't see any on-going blocker in my area.

Thanks for the notification.

 

Bests,

Tkaeshi

 

On Wed, Jul 15, 2020 at 4:03 AM Dongjoon Hyun <[hidden email]> wrote:

Hi, Yi.

 

Could you explain why you think that is a blocker? For the given example from the JIRA description,

 

spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt))
Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")
checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil)

 

Apache Spark 3.0.0 seems to work like the following.

 

scala> spark.version

res0: String = 3.0.0

 

scala> spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt))

res1: org.apache.spark.sql.expressions.UserDefinedFunction = SparkUserDefinedFunction($Lambda$1958/948653928@5d6bed7b,IntegerType,List(Some(class[value[0]: map<string,string>])),None,false,true)

 

scala> Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")

 

scala> sql("SELECT key(a) AS k FROM t GROUP BY key(a)").collect

res3: Array[org.apache.spark.sql.Row] = Array([1])

 

Could you provide a reproducible example?

 

Bests,

Dongjoon.

 

 

On Tue, Jul 14, 2020 at 10:04 AM Yi Wu <[hidden email]> wrote:

 

On Tue, Jul 14, 2020 at 11:13 PM Sean Owen <[hidden email]> wrote:

https://issues.apache.org/jira/browse/SPARK-32234 ?

On Tue, Jul 14, 2020 at 9:57 AM Shivaram Venkataraman
<[hidden email]> wrote:
>
> Hi all
>
> Just wanted to check if there are any blockers that we are still waiting for to start the new release process.
>
> Thanks
> Shivaram
>


 

--

---
Takeshi Yamamuro

Reply | Threaded
Open this post in threaded view
|

Re: 回复: [DISCUSS] Apache Spark 3.0.1 Release

Jason Moore-2
Thank you so much!  Any update on getting the RC1 up for vote?

Jason.



From: 郑瑞峰 <[hidden email]>
Sent: Wednesday, 5 August 2020 12:54 PM
To: Jason Moore <[hidden email]>; Spark dev list <[hidden email]>
Subject: 回复: [DISCUSS] Apache Spark 3.0.1 Release
 
Hi all,
I am going to prepare the realease of 3.0.1 RC1, with the help of Wenchen.


------------------ 原始邮件 ------------------
发件人: "Jason Moore" <[hidden email]>;
发送时间: 2020年7月30日(星期四) 上午10:35
收件人: "dev"<[hidden email]>;
主题: Re: [DISCUSS] Apache Spark 3.0.1 Release

Hi all,

 

Discussion around 3.0.1 seems to have trickled away.  What was blocking the release process kicking off?  I can see some unresolved bugs raised against 3.0.0, but conversely there were quite a few critical correctness fixes waiting to be released.

 

Cheers,

Jason.

 

From: Takeshi Yamamuro <[hidden email]>
Date: Wednesday, 15 July 2020 at 9:00 am
To: Shivaram Venkataraman <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [DISCUSS] Apache Spark 3.0.1 Release

 

> Just wanted to check if there are any blockers that we are still waiting for to start the new release process.

I don't see any on-going blocker in my area.

Thanks for the notification.

 

Bests,

Tkaeshi

 

On Wed, Jul 15, 2020 at 4:03 AM Dongjoon Hyun <[hidden email]> wrote:

Hi, Yi.

 

Could you explain why you think that is a blocker? For the given example from the JIRA description,

 

spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt))
Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")
checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil)

 

Apache Spark 3.0.0 seems to work like the following.

 

scala> spark.version

res0: String = 3.0.0

 

scala> spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt))

res1: org.apache.spark.sql.expressions.UserDefinedFunction = SparkUserDefinedFunction($Lambda$1958/948653928@5d6bed7b,IntegerType,List(Some(class[value[0]: map<string,string>])),None,false,true)

 

scala> Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")

 

scala> sql("SELECT key(a) AS k FROM t GROUP BY key(a)").collect

res3: Array[org.apache.spark.sql.Row] = Array([1])

 

Could you provide a reproducible example?

 

Bests,

Dongjoon.

 

 

On Tue, Jul 14, 2020 at 10:04 AM Yi Wu <[hidden email]> wrote:

 

On Tue, Jul 14, 2020 at 11:13 PM Sean Owen <[hidden email]> wrote:

https://issues.apache.org/jira/browse/SPARK-32234 ?

On Tue, Jul 14, 2020 at 9:57 AM Shivaram Venkataraman
<[hidden email]> wrote:
>
> Hi all
>
> Just wanted to check if there are any blockers that we are still waiting for to start the new release process.
>
> Thanks
> Shivaram
>


 

--

---
Takeshi Yamamuro

Reply | Threaded
Open this post in threaded view
|

Re: 回复: [DISCUSS] Apache Spark 3.0.1 Release

Koert Kuipers
i noticed commit today that seems to prepare for 3.0.1-rc1:
commit 05144a5c10cd37ebdbb55fde37d677def49af11f
Author: Ruifeng Zheng <[hidden email]>
Date:   Sat Aug 15 01:37:47 2020 +0000

    Preparing Spark release v3.0.1-rc1

so i tried to build spark on that commit and i get failure in sql:

09:36:57.371 ERROR org.apache.spark.scheduler.TaskSetManager: Task 0 in stage 77.0 failed 1 times; aborting job
[info] - SPARK-28224: Aggregate sum big decimal overflow *** FAILED *** (306 milliseconds)
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 77.0 failed 1 times, most recent failure: Lost task 0.0 in stage 77.0 (TID 197, 192.168.11.17, executor driver): java.lang.ArithmeticException: Decimal(expanded,111111111111111111110.246000000000000000,39,18}) cannot be represented as Decimal(38, 18).
[info] at org.apache.spark.sql.types.Decimal.toPrecision(Decimal.scala:369)
[info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregate_sum_0$(Unknown Source)
[info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doConsume_0$(Unknown Source)
[info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregateWithoutKey_0$(Unknown Source)
[info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
[info] at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[info] at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
[info] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
[info] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
[info] at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1804)
[info] at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1227)
[info] at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1227)
[info] at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2138)
[info] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
[info] at org.apache.spark.scheduler.Task.run(Task.scala:127)
[info] at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
[info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
[info] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
[info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info] at java.lang.Thread.run(Thread.java:748)

[error] Failed tests:
[error] org.apache.spark.sql.DataFrameSuite

On Thu, Aug 13, 2020 at 8:19 PM Jason Moore <[hidden email]> wrote:
Thank you so much!  Any update on getting the RC1 up for vote?

Jason.



From: 郑瑞峰 <[hidden email]>
Sent: Wednesday, 5 August 2020 12:54 PM
To: Jason Moore <[hidden email]>; Spark dev list <[hidden email]>
Subject: 回复: [DISCUSS] Apache Spark 3.0.1 Release
 
Hi all,
I am going to prepare the realease of 3.0.1 RC1, with the help of Wenchen.


------------------ 原始邮件 ------------------
发件人: "Jason Moore" <[hidden email]>;
发送时间: 2020年7月30日(星期四) 上午10:35
收件人: "dev"<[hidden email]>;
主题: Re: [DISCUSS] Apache Spark 3.0.1 Release

Hi all,

 

Discussion around 3.0.1 seems to have trickled away.  What was blocking the release process kicking off?  I can see some unresolved bugs raised against 3.0.0, but conversely there were quite a few critical correctness fixes waiting to be released.

 

Cheers,

Jason.

 

From: Takeshi Yamamuro <[hidden email]>
Date: Wednesday, 15 July 2020 at 9:00 am
To: Shivaram Venkataraman <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [DISCUSS] Apache Spark 3.0.1 Release

 

> Just wanted to check if there are any blockers that we are still waiting for to start the new release process.

I don't see any on-going blocker in my area.

Thanks for the notification.

 

Bests,

Tkaeshi

 

On Wed, Jul 15, 2020 at 4:03 AM Dongjoon Hyun <[hidden email]> wrote:

Hi, Yi.

 

Could you explain why you think that is a blocker? For the given example from the JIRA description,

 

spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt))
Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")
checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil)

 

Apache Spark 3.0.0 seems to work like the following.

 

scala> spark.version

res0: String = 3.0.0

 

scala> spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt))

res1: org.apache.spark.sql.expressions.UserDefinedFunction = SparkUserDefinedFunction($Lambda$1958/948653928@5d6bed7b,IntegerType,List(Some(class[value[0]: map<string,string>])),None,false,true)

 

scala> Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")

 

scala> sql("SELECT key(a) AS k FROM t GROUP BY key(a)").collect

res3: Array[org.apache.spark.sql.Row] = Array([1])

 

Could you provide a reproducible example?

 

Bests,

Dongjoon.

 

 

On Tue, Jul 14, 2020 at 10:04 AM Yi Wu <[hidden email]> wrote:

 

On Tue, Jul 14, 2020 at 11:13 PM Sean Owen <[hidden email]> wrote:

https://issues.apache.org/jira/browse/SPARK-32234 ?

On Tue, Jul 14, 2020 at 9:57 AM Shivaram Venkataraman
<[hidden email]> wrote:
>
> Hi all
>
> Just wanted to check if there are any blockers that we are still waiting for to start the new release process.
>
> Thanks
> Shivaram
>


 

--

---
Takeshi Yamamuro

Reply | Threaded
Open this post in threaded view
|

Re: 回复: [DISCUSS] Apache Spark 3.0.1 Release

Takeshi Yamamuro
I've checked the Jenkins log and It seems the commit from https://github.com/apache/spark/pull/29404 caused the failure.


On Sat, Aug 15, 2020 at 10:43 PM Koert Kuipers <[hidden email]> wrote:
i noticed commit today that seems to prepare for 3.0.1-rc1:
commit 05144a5c10cd37ebdbb55fde37d677def49af11f
Author: Ruifeng Zheng <[hidden email]>
Date:   Sat Aug 15 01:37:47 2020 +0000

    Preparing Spark release v3.0.1-rc1

so i tried to build spark on that commit and i get failure in sql:

09:36:57.371 ERROR org.apache.spark.scheduler.TaskSetManager: Task 0 in stage 77.0 failed 1 times; aborting job
[info] - SPARK-28224: Aggregate sum big decimal overflow *** FAILED *** (306 milliseconds)
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 77.0 failed 1 times, most recent failure: Lost task 0.0 in stage 77.0 (TID 197, 192.168.11.17, executor driver): java.lang.ArithmeticException: Decimal(expanded,111111111111111111110.246000000000000000,39,18}) cannot be represented as Decimal(38, 18).
[info] at org.apache.spark.sql.types.Decimal.toPrecision(Decimal.scala:369)
[info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregate_sum_0$(Unknown Source)
[info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doConsume_0$(Unknown Source)
[info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregateWithoutKey_0$(Unknown Source)
[info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
[info] at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[info] at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
[info] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
[info] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
[info] at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1804)
[info] at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1227)
[info] at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1227)
[info] at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2138)
[info] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
[info] at org.apache.spark.scheduler.Task.run(Task.scala:127)
[info] at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
[info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
[info] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
[info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info] at java.lang.Thread.run(Thread.java:748)

[error] Failed tests:
[error] org.apache.spark.sql.DataFrameSuite

On Thu, Aug 13, 2020 at 8:19 PM Jason Moore <[hidden email]> wrote:
Thank you so much!  Any update on getting the RC1 up for vote?

Jason.



From: 郑瑞峰 <[hidden email]>
Sent: Wednesday, 5 August 2020 12:54 PM
To: Jason Moore <[hidden email]>; Spark dev list <[hidden email]>
Subject: 回复: [DISCUSS] Apache Spark 3.0.1 Release
 
Hi all,
I am going to prepare the realease of 3.0.1 RC1, with the help of Wenchen.


------------------ 原始邮件 ------------------
发件人: "Jason Moore" <[hidden email]>;
发送时间: 2020年7月30日(星期四) 上午10:35
收件人: "dev"<[hidden email]>;
主题: Re: [DISCUSS] Apache Spark 3.0.1 Release

Hi all,

 

Discussion around 3.0.1 seems to have trickled away.  What was blocking the release process kicking off?  I can see some unresolved bugs raised against 3.0.0, but conversely there were quite a few critical correctness fixes waiting to be released.

 

Cheers,

Jason.

 

From: Takeshi Yamamuro <[hidden email]>
Date: Wednesday, 15 July 2020 at 9:00 am
To: Shivaram Venkataraman <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [DISCUSS] Apache Spark 3.0.1 Release

 

> Just wanted to check if there are any blockers that we are still waiting for to start the new release process.

I don't see any on-going blocker in my area.

Thanks for the notification.

 

Bests,

Tkaeshi

 

On Wed, Jul 15, 2020 at 4:03 AM Dongjoon Hyun <[hidden email]> wrote:

Hi, Yi.

 

Could you explain why you think that is a blocker? For the given example from the JIRA description,

 

spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt))
Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")
checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil)

 

Apache Spark 3.0.0 seems to work like the following.

 

scala> spark.version

res0: String = 3.0.0

 

scala> spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt))

res1: org.apache.spark.sql.expressions.UserDefinedFunction = SparkUserDefinedFunction($Lambda$1958/948653928@5d6bed7b,IntegerType,List(Some(class[value[0]: map<string,string>])),None,false,true)

 

scala> Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")

 

scala> sql("SELECT key(a) AS k FROM t GROUP BY key(a)").collect

res3: Array[org.apache.spark.sql.Row] = Array([1])

 

Could you provide a reproducible example?

 

Bests,

Dongjoon.

 

 

On Tue, Jul 14, 2020 at 10:04 AM Yi Wu <[hidden email]> wrote:

 

On Tue, Jul 14, 2020 at 11:13 PM Sean Owen <[hidden email]> wrote:

https://issues.apache.org/jira/browse/SPARK-32234 ?

On Tue, Jul 14, 2020 at 9:57 AM Shivaram Venkataraman
<[hidden email]> wrote:
>
> Hi all
>
> Just wanted to check if there are any blockers that we are still waiting for to start the new release process.
>
> Thanks
> Shivaram
>


 

--

---
Takeshi Yamamuro



--
---
Takeshi Yamamuro
Reply | Threaded
Open this post in threaded view
|

Re: 回复: [DISCUSS] Apache Spark 3.0.1 Release

ruifengz

Thanks for letting us know this issue.


On 8/16/20 11:31 PM, Takeshi Yamamuro wrote:
I've checked the Jenkins log and It seems the commit from https://github.com/apache/spark/pull/29404 caused the failure.


On Sat, Aug 15, 2020 at 10:43 PM Koert Kuipers <[hidden email]> wrote:
i noticed commit today that seems to prepare for 3.0.1-rc1:
commit 05144a5c10cd37ebdbb55fde37d677def49af11f
Author: Ruifeng Zheng <[hidden email]>
Date:   Sat Aug 15 01:37:47 2020 +0000

    Preparing Spark release v3.0.1-rc1

so i tried to build spark on that commit and i get failure in sql:

09:36:57.371 ERROR org.apache.spark.scheduler.TaskSetManager: Task 0 in stage 77.0 failed 1 times; aborting job
[info] - SPARK-28224: Aggregate sum big decimal overflow *** FAILED *** (306 milliseconds)
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 77.0 failed 1 times, most recent failure: Lost task 0.0 in stage 77.0 (TID 197, 192.168.11.17, executor driver): java.lang.ArithmeticException: Decimal(expanded,111111111111111111110.246000000000000000,39,18}) cannot be represented as Decimal(38, 18).
[info] at org.apache.spark.sql.types.Decimal.toPrecision(Decimal.scala:369)
[info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregate_sum_0$(Unknown Source)
[info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doConsume_0$(Unknown Source)
[info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregateWithoutKey_0$(Unknown Source)
[info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
[info] at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[info] at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
[info] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
[info] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
[info] at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1804)
[info] at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1227)
[info] at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1227)
[info] at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2138)
[info] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
[info] at org.apache.spark.scheduler.Task.run(Task.scala:127)
[info] at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
[info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
[info] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
[info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info] at java.lang.Thread.run(Thread.java:748)

[error] Failed tests:
[error] org.apache.spark.sql.DataFrameSuite

On Thu, Aug 13, 2020 at 8:19 PM Jason Moore [hidden email] wrote:
Thank you so much!  Any update on getting the RC1 up for vote?

Jason.



From: 郑瑞峰 <[hidden email]>
Sent: Wednesday, 5 August 2020 12:54 PM
To: Jason Moore [hidden email]; Spark dev list <[hidden email]>
Subject: 回复: [DISCUSS] Apache Spark 3.0.1 Release
 
Hi all,
I am going to prepare the realease of 3.0.1 RC1, with the help of Wenchen.


------------------ 原始邮件 ------------------
发件人: "Jason Moore" [hidden email];
发送时间: 2020年7月30日(星期四) 上午10:35
收件人: "dev"<[hidden email]>;
主题: Re: [DISCUSS] Apache Spark 3.0.1 Release

Hi all,

 

Discussion around 3.0.1 seems to have trickled away.  What was blocking the release process kicking off?  I can see some unresolved bugs raised against 3.0.0, but conversely there were quite a few critical correctness fixes waiting to be released.

 

Cheers,

Jason.

 

From: Takeshi Yamamuro <[hidden email]>
Date: Wednesday, 15 July 2020 at 9:00 am
To: Shivaram Venkataraman <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [DISCUSS] Apache Spark 3.0.1 Release

 

> Just wanted to check if there are any blockers that we are still waiting for to start the new release process.

I don't see any on-going blocker in my area.

Thanks for the notification.

 

Bests,

Tkaeshi

 

On Wed, Jul 15, 2020 at 4:03 AM Dongjoon Hyun <[hidden email]> wrote:

Hi, Yi.

 

Could you explain why you think that is a blocker? For the given example from the JIRA description,

 

spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt))
Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")
checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil)

 

Apache Spark 3.0.0 seems to work like the following.

 

scala> spark.version

res0: String = 3.0.0

 

scala> spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt))

res1: org.apache.spark.sql.expressions.UserDefinedFunction = SparkUserDefinedFunction($Lambda$1958/948653928@5d6bed7b,IntegerType,List(Some(class[value[0]: map<string,string>])),None,false,true)

 

scala> Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")

 

scala> sql("SELECT key(a) AS k FROM t GROUP BY key(a)").collect

res3: Array[org.apache.spark.sql.Row] = Array([1])

 

Could you provide a reproducible example?

 

Bests,

Dongjoon.

 

 

On Tue, Jul 14, 2020 at 10:04 AM Yi Wu <[hidden email]> wrote:

 

On Tue, Jul 14, 2020 at 11:13 PM Sean Owen <[hidden email]> wrote:

https://issues.apache.org/jira/browse/SPARK-32234 ?

On Tue, Jul 14, 2020 at 9:57 AM Shivaram Venkataraman
<[hidden email]> wrote:
>
> Hi all
>
> Just wanted to check if there are any blockers that we are still waiting for to start the new release process.
>
> Thanks
> Shivaram
>


 

--

---
Takeshi Yamamuro



--
---
Takeshi Yamamuro
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache Spark 3.0.1 Release

wuyi
In reply to this post by ruifengz
Hi ruifeng, Thank you for your work. I have a backport PR for 3.0: https://github.com/apache/spark/pull/29395. It waits for tests now.

Best,
Yi

On Wed, Aug 5, 2020 at 10:57 AM 郑瑞峰 <[hidden email]> wrote:
Hi all,
I am going to prepare the realease of 3.0.1 RC1, with the help of Wenchen.


------------------ 原始邮件 ------------------
发件人: "Jason Moore" <[hidden email]>;
发送时间: 2020年7月30日(星期四) 上午10:35
收件人: "dev"<[hidden email]>;
主题: Re: [DISCUSS] Apache Spark 3.0.1 Release

Hi all,

 

Discussion around 3.0.1 seems to have trickled away.  What was blocking the release process kicking off?  I can see some unresolved bugs raised against 3.0.0, but conversely there were quite a few critical correctness fixes waiting to be released.

 

Cheers,

Jason.

 

From: Takeshi Yamamuro <[hidden email]>
Date: Wednesday, 15 July 2020 at 9:00 am
To: Shivaram Venkataraman <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [DISCUSS] Apache Spark 3.0.1 Release

 

> Just wanted to check if there are any blockers that we are still waiting for to start the new release process.

I don't see any on-going blocker in my area.

Thanks for the notification.

 

Bests,

Tkaeshi

 

On Wed, Jul 15, 2020 at 4:03 AM Dongjoon Hyun <[hidden email]> wrote:

Hi, Yi.

 

Could you explain why you think that is a blocker? For the given example from the JIRA description,

 

spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt))
Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")
checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil)

 

Apache Spark 3.0.0 seems to work like the following.

 

scala> spark.version

res0: String = 3.0.0

 

scala> spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt))

res1: org.apache.spark.sql.expressions.UserDefinedFunction = SparkUserDefinedFunction($Lambda$1958/948653928@5d6bed7b,IntegerType,List(Some(class[value[0]: map<string,string>])),None,false,true)

 

scala> Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")

 

scala> sql("SELECT key(a) AS k FROM t GROUP BY key(a)").collect

res3: Array[org.apache.spark.sql.Row] = Array([1])

 

Could you provide a reproducible example?

 

Bests,

Dongjoon.

 

 

On Tue, Jul 14, 2020 at 10:04 AM Yi Wu <[hidden email]> wrote:

 

On Tue, Jul 14, 2020 at 11:13 PM Sean Owen <[hidden email]> wrote:

https://issues.apache.org/jira/browse/SPARK-32234 ?

On Tue, Jul 14, 2020 at 9:57 AM Shivaram Venkataraman
<[hidden email]> wrote:
>
> Hi all
>
> Just wanted to check if there are any blockers that we are still waiting for to start the new release process.
>
> Thanks
> Shivaram
>


 

--

---
Takeshi Yamamuro

Reply | Threaded
Open this post in threaded view
|

Re: 回复: [DISCUSS] Apache Spark 3.0.1 Release

Tom Graves-2
In reply to this post by ruifengz
Hey,

I'm just curious what the status of the 3.0.1 release is?  Do we have some blockers we are waiting on?

Thanks,
Tom

On Sunday, August 16, 2020, 09:07:44 PM CDT, ruifengz <[hidden email]> wrote:


Thanks for letting us know this issue.


On 8/16/20 11:31 PM, Takeshi Yamamuro wrote:
I've checked the Jenkins log and It seems the commit from https://github.com/apache/spark/pull/29404 caused the failure.


On Sat, Aug 15, 2020 at 10:43 PM Koert Kuipers <[hidden email]> wrote:
i noticed commit today that seems to prepare for 3.0.1-rc1:
commit 05144a5c10cd37ebdbb55fde37d677def49af11f
Author: Ruifeng Zheng <[hidden email]>
Date:   Sat Aug 15 01:37:47 2020 +0000

    Preparing Spark release v3.0.1-rc1

so i tried to build spark on that commit and i get failure in sql:

09:36:57.371 ERROR org.apache.spark.scheduler.TaskSetManager: Task 0 in stage 77.0 failed 1 times; aborting job
[info] - SPARK-28224: Aggregate sum big decimal overflow *** FAILED *** (306 milliseconds)
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 77.0 failed 1 times, most recent failure: Lost task 0.0 in stage 77.0 (TID 197, 192.168.11.17, executor driver): java.lang.ArithmeticException: Decimal(expanded,111111111111111111110.246000000000000000,39,18}) cannot be represented as Decimal(38, 18).
[info] at org.apache.spark.sql.types.Decimal.toPrecision(Decimal.scala:369)
[info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregate_sum_0$(Unknown Source)
[info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doConsume_0$(Unknown Source)
[info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregateWithoutKey_0$(Unknown Source)
[info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
[info] at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[info] at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
[info] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
[info] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
[info] at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1804)
[info] at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1227)
[info] at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1227)
[info] at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2138)
[info] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
[info] at org.apache.spark.scheduler.Task.run(Task.scala:127)
[info] at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
[info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
[info] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
[info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info] at java.lang.Thread.run(Thread.java:748)

[error] Failed tests:
[error] org.apache.spark.sql.DataFrameSuite

On Thu, Aug 13, 2020 at 8:19 PM Jason Moore [hidden email] wrote:
Thank you so much!  Any update on getting the RC1 up for vote?

Jason.



From: 郑瑞峰 <[hidden email]>
Sent: Wednesday, 5 August 2020 12:54 PM
To: Jason Moore [hidden email]; Spark dev list <[hidden email]>
Subject: 回复: [DISCUSS] Apache Spark 3.0.1 Release
 
Hi all,
I am going to prepare the realease of 3.0.1 RC1, with the help of Wenchen.


------------------ 原始邮件 ------------------
发件人: "Jason Moore" [hidden email];
发送时间: 2020年7月30日(星期四) 上午10:35
收件人: "dev"<[hidden email]>;
主题: Re: [DISCUSS] Apache Spark 3.0.1 Release

Hi all,

 

Discussion around 3.0.1 seems to have trickled away.  What was blocking the release process kicking off?  I can see some unresolved bugs raised against 3.0.0, but conversely there were quite a few critical correctness fixes waiting to be released.

 

Cheers,

Jason.

 

From: Takeshi Yamamuro <[hidden email]>
Date: Wednesday, 15 July 2020 at 9:00 am
To: Shivaram Venkataraman <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [DISCUSS] Apache Spark 3.0.1 Release

 

> Just wanted to check if there are any blockers that we are still waiting for to start the new release process.

I don't see any on-going blocker in my area.

Thanks for the notification.

 

Bests,

Tkaeshi

 

On Wed, Jul 15, 2020 at 4:03 AM Dongjoon Hyun <[hidden email]> wrote:

Hi, Yi.

 

Could you explain why you think that is a blocker? For the given example from the JIRA description,

 

spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt))
Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")
checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil)

 

Apache Spark 3.0.0 seems to work like the following.

 

scala> spark.version

res0: String = 3.0.0

 

scala> spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt))

res1: org.apache.spark.sql.expressions.UserDefinedFunction = SparkUserDefinedFunction($Lambda$1958/948653928@5d6bed7b,IntegerType,List(Some(class[value[0]: map<string,string>])),None,false,true)

 

scala> Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")

 

scala> sql("SELECT key(a) AS k FROM t GROUP BY key(a)").collect

res3: Array[org.apache.spark.sql.Row] = Array([1])

 

Could you provide a reproducible example?

 

Bests,

Dongjoon.

 

 

On Tue, Jul 14, 2020 at 10:04 AM Yi Wu <[hidden email]> wrote:

 

On Tue, Jul 14, 2020 at 11:13 PM Sean Owen <[hidden email]> wrote:

https://issues.apache.org/jira/browse/SPARK-32234 ?

On Tue, Jul 14, 2020 at 9:57 AM Shivaram Venkataraman
<[hidden email]> wrote:
>
> Hi all
>
> Just wanted to check if there are any blockers that we are still waiting for to start the new release process.
>
> Thanks
> Shivaram
>


 

--

---
Takeshi Yamamuro



--
---
Takeshi Yamamuro
Reply | Threaded
Open this post in threaded view
|

Re: 回复: [DISCUSS] Apache Spark 3.0.1 Release

Dongjoon Hyun-2
For the correctness blocker, we have the following, Tom.


Bests,
Dongjoon.

On Tue, Aug 25, 2020 at 6:32 AM Tom Graves <[hidden email]> wrote:
Hey,

I'm just curious what the status of the 3.0.1 release is?  Do we have some blockers we are waiting on?

Thanks,
Tom

On Sunday, August 16, 2020, 09:07:44 PM CDT, ruifengz <[hidden email]> wrote:


Thanks for letting us know this issue.


On 8/16/20 11:31 PM, Takeshi Yamamuro wrote:
I've checked the Jenkins log and It seems the commit from https://github.com/apache/spark/pull/29404 caused the failure.


On Sat, Aug 15, 2020 at 10:43 PM Koert Kuipers <[hidden email]> wrote:
i noticed commit today that seems to prepare for 3.0.1-rc1:
commit 05144a5c10cd37ebdbb55fde37d677def49af11f
Author: Ruifeng Zheng <[hidden email]>
Date:   Sat Aug 15 01:37:47 2020 +0000

    Preparing Spark release v3.0.1-rc1

so i tried to build spark on that commit and i get failure in sql:

09:36:57.371 ERROR org.apache.spark.scheduler.TaskSetManager: Task 0 in stage 77.0 failed 1 times; aborting job
[info] - SPARK-28224: Aggregate sum big decimal overflow *** FAILED *** (306 milliseconds)
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 77.0 failed 1 times, most recent failure: Lost task 0.0 in stage 77.0 (TID 197, 192.168.11.17, executor driver): java.lang.ArithmeticException: Decimal(expanded,111111111111111111110.246000000000000000,39,18}) cannot be represented as Decimal(38, 18).
[info] at org.apache.spark.sql.types.Decimal.toPrecision(Decimal.scala:369)
[info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregate_sum_0$(Unknown Source)
[info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doConsume_0$(Unknown Source)
[info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregateWithoutKey_0$(Unknown Source)
[info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
[info] at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[info] at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
[info] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
[info] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
[info] at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1804)
[info] at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1227)
[info] at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1227)
[info] at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2138)
[info] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
[info] at org.apache.spark.scheduler.Task.run(Task.scala:127)
[info] at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
[info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
[info] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
[info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info] at java.lang.Thread.run(Thread.java:748)

[error] Failed tests:
[error] org.apache.spark.sql.DataFrameSuite

On Thu, Aug 13, 2020 at 8:19 PM Jason Moore [hidden email] wrote:
Thank you so much!  Any update on getting the RC1 up for vote?

Jason.



From: 郑瑞峰 <[hidden email]>
Sent: Wednesday, 5 August 2020 12:54 PM
To: Jason Moore [hidden email]; Spark dev list <[hidden email]>
Subject: 回复: [DISCUSS] Apache Spark 3.0.1 Release
 
Hi all,
I am going to prepare the realease of 3.0.1 RC1, with the help of Wenchen.


------------------ 原始邮件 ------------------
发件人: "Jason Moore" [hidden email];
发送时间: 2020年7月30日(星期四) 上午10:35
收件人: "dev"<[hidden email]>;
主题: Re: [DISCUSS] Apache Spark 3.0.1 Release

Hi all,

 

Discussion around 3.0.1 seems to have trickled away.  What was blocking the release process kicking off?  I can see some unresolved bugs raised against 3.0.0, but conversely there were quite a few critical correctness fixes waiting to be released.

 

Cheers,

Jason.

 

From: Takeshi Yamamuro <[hidden email]>
Date: Wednesday, 15 July 2020 at 9:00 am
To: Shivaram Venkataraman <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [DISCUSS] Apache Spark 3.0.1 Release

 

> Just wanted to check if there are any blockers that we are still waiting for to start the new release process.

I don't see any on-going blocker in my area.

Thanks for the notification.

 

Bests,

Tkaeshi

 

On Wed, Jul 15, 2020 at 4:03 AM Dongjoon Hyun <[hidden email]> wrote:

Hi, Yi.

 

Could you explain why you think that is a blocker? For the given example from the JIRA description,

 

spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt))
Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")
checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil)

 

Apache Spark 3.0.0 seems to work like the following.

 

scala> spark.version

res0: String = 3.0.0

 

scala> spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt))

res1: org.apache.spark.sql.expressions.UserDefinedFunction = SparkUserDefinedFunction($Lambda$1958/948653928@5d6bed7b,IntegerType,List(Some(class[value[0]: map<string,string>])),None,false,true)

 

scala> Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")

 

scala> sql("SELECT key(a) AS k FROM t GROUP BY key(a)").collect

res3: Array[org.apache.spark.sql.Row] = Array([1])

 

Could you provide a reproducible example?

 

Bests,

Dongjoon.

 

 

On Tue, Jul 14, 2020 at 10:04 AM Yi Wu <[hidden email]> wrote:

 

On Tue, Jul 14, 2020 at 11:13 PM Sean Owen <[hidden email]> wrote:

https://issues.apache.org/jira/browse/SPARK-32234 ?

On Tue, Jul 14, 2020 at 9:57 AM Shivaram Venkataraman
<[hidden email]> wrote:
>
> Hi all
>
> Just wanted to check if there are any blockers that we are still waiting for to start the new release process.
>
> Thanks
> Shivaram
>


 

--

---
Takeshi Yamamuro



--
---
Takeshi Yamamuro
Reply | Threaded
Open this post in threaded view
|

Re: 回复: [DISCUSS] Apache Spark 3.0.1 Release

Sean Owen-2
That isn't a blocker (see comments - not a regression).
That said I think we have a fix ready to merge now, if there are no objections.

On Tue, Aug 25, 2020 at 10:24 AM Dongjoon Hyun <[hidden email]> wrote:
>
> For the correctness blocker, we have the following, Tom.
>
> - https://issues.apache.org/jira/browse/SPARK-32614
> - https://github.com/apache/spark/pull/29516
>
> Bests,
> Dongjoon.
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: 回复: [DISCUSS] Apache Spark 3.0.1 Release

Yuming Wang

On Tue, Aug 25, 2020 at 11:25 PM Sean Owen <[hidden email]> wrote:
That isn't a blocker (see comments - not a regression).
That said I think we have a fix ready to merge now, if there are no objections.

On Tue, Aug 25, 2020 at 10:24 AM Dongjoon Hyun <[hidden email]> wrote:
>
> For the correctness blocker, we have the following, Tom.
>
> - https://issues.apache.org/jira/browse/SPARK-32614
> - https://github.com/apache/spark/pull/29516
>
> Bests,
> Dongjoon.
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]