[DISCUSS] Apache Spark 3.0.1 Release

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Apache Spark 3.0.1 Release

Yuanjian Li

Hi dev-list,


I’m writing this to raise the discussion about Spark 3.0.1 feasibility since 4 blocker issues were found after Spark 3.0.0:


  1. [SPARK-31990] The state store compatibility broken will cause a correctness issue when Streaming query with `dropDuplicate` uses the checkpoint written by the old Spark version.

  2. [SPARK-32038] The regression bug in handling NaN values in COUNT(DISTINCT)

  3. [SPARK-31918][WIP] CRAN requires to make it working with the latest R 4.0. It makes the 3.0 release unavailable on CRAN, and only supports R [3.5, 4.0)

  4. [SPARK-31967] Downgrade vis.js to fix Jobs UI loading time regression


I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. I think it would be great if we have Spark 3.0.1 to deliver the critical fixes.


Any comments are appreciated.


Best,

Yuanjian


Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache Spark 3.0.1 Release

Takeshi Yamamuro
Thanks for the heads-up, Yuanjian!

> I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.
wow, the updates are so quick. Anyway, +1 for the release.

Bests,
Takeshi

On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li <[hidden email]> wrote:

Hi dev-list,


I’m writing this to raise the discussion about Spark 3.0.1 feasibility since 4 blocker issues were found after Spark 3.0.0:


  1. [SPARK-31990] The state store compatibility broken will cause a correctness issue when Streaming query with `dropDuplicate` uses the checkpoint written by the old Spark version.

  2. [SPARK-32038] The regression bug in handling NaN values in COUNT(DISTINCT)

  3. [SPARK-31918][WIP] CRAN requires to make it working with the latest R 4.0. It makes the 3.0 release unavailable on CRAN, and only supports R [3.5, 4.0)

  4. [SPARK-31967] Downgrade vis.js to fix Jobs UI loading time regression


I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. I think it would be great if we have Spark 3.0.1 to deliver the critical fixes.


Any comments are appreciated.


Best,

Yuanjian




--
---
Takeshi Yamamuro
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache Spark 3.0.1 Release

Shivaram Venkataraman
+1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release soon.

Shivaram

On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro <[hidden email]> wrote:

>
> Thanks for the heads-up, Yuanjian!
>
> > I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.
> wow, the updates are so quick. Anyway, +1 for the release.
>
> Bests,
> Takeshi
>
> On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li <[hidden email]> wrote:
>>
>> Hi dev-list,
>>
>>
>> I’m writing this to raise the discussion about Spark 3.0.1 feasibility since 4 blocker issues were found after Spark 3.0.0:
>>
>>
>> [SPARK-31990] The state store compatibility broken will cause a correctness issue when Streaming query with `dropDuplicate` uses the checkpoint written by the old Spark version.
>>
>> [SPARK-32038] The regression bug in handling NaN values in COUNT(DISTINCT)
>>
>> [SPARK-31918][WIP] CRAN requires to make it working with the latest R 4.0. It makes the 3.0 release unavailable on CRAN, and only supports R [3.5, 4.0)
>>
>> [SPARK-31967] Downgrade vis.js to fix Jobs UI loading time regression
>>
>>
>> I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. I think it would be great if we have Spark 3.0.1 to deliver the critical fixes.
>>
>>
>> Any comments are appreciated.
>>
>>
>> Best,
>>
>> Yuanjian
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache Spark 3.0.1 Release

rxin
+1 on doing a new patch release soon. I saw some of these issues when preparing the 3.0 release, and some of them are very serious.


On Tue, Jun 23, 2020 at 8:06 AM, Shivaram Venkataraman <[hidden email]> wrote:

+1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release soon.

Shivaram

On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro <[hidden email]> wrote:

Thanks for the heads-up, Yuanjian!

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.

wow, the updates are so quick. Anyway, +1 for the release.

Bests,
Takeshi

On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li <[hidden email]> wrote:

Hi dev-list,

I’m writing this to raise the discussion about Spark 3.0.1 feasibility since 4 blocker issues were found after Spark 3.0.0:

[SPARK-31990] The state store compatibility broken will cause a correctness issue when Streaming query with `dropDuplicate` uses the checkpoint written by the old Spark version.

[SPARK-32038] The regression bug in handling NaN values in COUNT(DISTINCT)

[SPARK-31918][WIP] CRAN requires to make it working with the latest R 4.0. It makes the 3.0 release unavailable on CRAN, and only supports R [3.5, 4.0)

[SPARK-31967] Downgrade vis.js to fix Jobs UI loading time regression

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. I think it would be great if we have Spark 3.0.1 to deliver the critical fixes.

Any comments are appreciated.

Best,

Yuanjian

--
---
Takeshi Yamamuro

--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]



smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache Spark 3.0.1 Release

Holden Karau
+1 on a patch release soon

On Tue, Jun 23, 2020 at 10:47 AM Reynold Xin <[hidden email]> wrote:
+1 on doing a new patch release soon. I saw some of these issues when preparing the 3.0 release, and some of them are very serious.


On Tue, Jun 23, 2020 at 8:06 AM, Shivaram Venkataraman <[hidden email]> wrote:

+1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release soon.

Shivaram

On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro <[hidden email]> wrote:

Thanks for the heads-up, Yuanjian!

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.

wow, the updates are so quick. Anyway, +1 for the release.

Bests,
Takeshi

On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li <[hidden email]> wrote:

Hi dev-list,

I’m writing this to raise the discussion about Spark 3.0.1 feasibility since 4 blocker issues were found after Spark 3.0.0:

[SPARK-31990] The state store compatibility broken will cause a correctness issue when Streaming query with `dropDuplicate` uses the checkpoint written by the old Spark version.

[SPARK-32038] The regression bug in handling NaN values in COUNT(DISTINCT)

[SPARK-31918][WIP] CRAN requires to make it working with the latest R 4.0. It makes the 3.0 release unavailable on CRAN, and only supports R [3.5, 4.0)

[SPARK-31967] Downgrade vis.js to fix Jobs UI loading time regression

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. I think it would be great if we have Spark 3.0.1 to deliver the critical fixes.

Any comments are appreciated.

Best,

Yuanjian

--
---
Takeshi Yamamuro

--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]




--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache Spark 3.0.1 Release

Jules Damji-2
+1 (non-binding)

Sent from my iPhone
Pardon the dumb thumb typos :)

On Jun 23, 2020, at 11:36 AM, Holden Karau <[hidden email]> wrote:


+1 on a patch release soon

On Tue, Jun 23, 2020 at 10:47 AM Reynold Xin <[hidden email]> wrote:
+1 on doing a new patch release soon. I saw some of these issues when preparing the 3.0 release, and some of them are very serious.


On Tue, Jun 23, 2020 at 8:06 AM, Shivaram Venkataraman <[hidden email]> wrote:

+1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release soon.

Shivaram

On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro <[hidden email]> wrote:

Thanks for the heads-up, Yuanjian!

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.

wow, the updates are so quick. Anyway, +1 for the release.

Bests,
Takeshi

On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li <[hidden email]> wrote:

Hi dev-list,

I’m writing this to raise the discussion about Spark 3.0.1 feasibility since 4 blocker issues were found after Spark 3.0.0:

[SPARK-31990] The state store compatibility broken will cause a correctness issue when Streaming query with `dropDuplicate` uses the checkpoint written by the old Spark version.

[SPARK-32038] The regression bug in handling NaN values in COUNT(DISTINCT)

[SPARK-31918][WIP] CRAN requires to make it working with the latest R 4.0. It makes the 3.0 release unavailable on CRAN, and only supports R [3.5, 4.0)

[SPARK-31967] Downgrade vis.js to fix Jobs UI loading time regression

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. I think it would be great if we have Spark 3.0.1 to deliver the critical fixes.

Any comments are appreciated.

Best,

Yuanjian

--
---
Takeshi Yamamuro

--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]




--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache Spark 3.0.1 Release

Jungtaek Lim-2
+1 on a 3.0.1 soon.

Probably it would be nice if some Scala experts can take a look at https://issues.apache.org/jira/browse/SPARK-32051 and include the fix into 3.0.1 if possible.
Looks like APIs designed to work with Scala 2.11 & Java bring ambiguity in Scala 2.12 & Java. 

On Wed, Jun 24, 2020 at 4:52 AM Jules Damji <[hidden email]> wrote:
+1 (non-binding)

Sent from my iPhone
Pardon the dumb thumb typos :)

On Jun 23, 2020, at 11:36 AM, Holden Karau <[hidden email]> wrote:


+1 on a patch release soon

On Tue, Jun 23, 2020 at 10:47 AM Reynold Xin <[hidden email]> wrote:
+1 on doing a new patch release soon. I saw some of these issues when preparing the 3.0 release, and some of them are very serious.


On Tue, Jun 23, 2020 at 8:06 AM, Shivaram Venkataraman <[hidden email]> wrote:

+1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release soon.

Shivaram

On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro <[hidden email]> wrote:

Thanks for the heads-up, Yuanjian!

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.

wow, the updates are so quick. Anyway, +1 for the release.

Bests,
Takeshi

On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li <[hidden email]> wrote:

Hi dev-list,

I’m writing this to raise the discussion about Spark 3.0.1 feasibility since 4 blocker issues were found after Spark 3.0.0:

[SPARK-31990] The state store compatibility broken will cause a correctness issue when Streaming query with `dropDuplicate` uses the checkpoint written by the old Spark version.

[SPARK-32038] The regression bug in handling NaN values in COUNT(DISTINCT)

[SPARK-31918][WIP] CRAN requires to make it working with the latest R 4.0. It makes the 3.0 release unavailable on CRAN, and only supports R [3.5, 4.0)

[SPARK-31967] Downgrade vis.js to fix Jobs UI loading time regression

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. I think it would be great if we have Spark 3.0.1 to deliver the critical fixes.

Any comments are appreciated.

Best,

Yuanjian

--
---
Takeshi Yamamuro

--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]




--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache Spark 3.0.1 Release

Dongjoon Hyun-2
+1

Bests,
Dongjoon.

On Tue, Jun 23, 2020 at 1:19 PM Jungtaek Lim <[hidden email]> wrote:
+1 on a 3.0.1 soon.

Probably it would be nice if some Scala experts can take a look at https://issues.apache.org/jira/browse/SPARK-32051 and include the fix into 3.0.1 if possible.
Looks like APIs designed to work with Scala 2.11 & Java bring ambiguity in Scala 2.12 & Java. 

On Wed, Jun 24, 2020 at 4:52 AM Jules Damji <[hidden email]> wrote:
+1 (non-binding)

Sent from my iPhone
Pardon the dumb thumb typos :)

On Jun 23, 2020, at 11:36 AM, Holden Karau <[hidden email]> wrote:


+1 on a patch release soon

On Tue, Jun 23, 2020 at 10:47 AM Reynold Xin <[hidden email]> wrote:
+1 on doing a new patch release soon. I saw some of these issues when preparing the 3.0 release, and some of them are very serious.


On Tue, Jun 23, 2020 at 8:06 AM, Shivaram Venkataraman <[hidden email]> wrote:

+1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release soon.

Shivaram

On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro <[hidden email]> wrote:

Thanks for the heads-up, Yuanjian!

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.

wow, the updates are so quick. Anyway, +1 for the release.

Bests,
Takeshi

On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li <[hidden email]> wrote:

Hi dev-list,

I’m writing this to raise the discussion about Spark 3.0.1 feasibility since 4 blocker issues were found after Spark 3.0.0:

[SPARK-31990] The state store compatibility broken will cause a correctness issue when Streaming query with `dropDuplicate` uses the checkpoint written by the old Spark version.

[SPARK-32038] The regression bug in handling NaN values in COUNT(DISTINCT)

[SPARK-31918][WIP] CRAN requires to make it working with the latest R 4.0. It makes the 3.0 release unavailable on CRAN, and only supports R [3.5, 4.0)

[SPARK-31967] Downgrade vis.js to fix Jobs UI loading time regression

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. I think it would be great if we have Spark 3.0.1 to deliver the critical fixes.

Any comments are appreciated.

Best,

Yuanjian

--
---
Takeshi Yamamuro

--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]




--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache Spark 3.0.1 Release

Hyukjin Kwon
+1.

Just as a note,
SPARK-31918 is fixed now, and there's no blocker. - When we build SparkR, we should use the latest R version at least 4.0.0+.

2020년 6월 24일 (수) 오전 11:20, Dongjoon Hyun <[hidden email]>님이 작성:
+1

Bests,
Dongjoon.

On Tue, Jun 23, 2020 at 1:19 PM Jungtaek Lim <[hidden email]> wrote:
+1 on a 3.0.1 soon.

Probably it would be nice if some Scala experts can take a look at https://issues.apache.org/jira/browse/SPARK-32051 and include the fix into 3.0.1 if possible.
Looks like APIs designed to work with Scala 2.11 & Java bring ambiguity in Scala 2.12 & Java. 

On Wed, Jun 24, 2020 at 4:52 AM Jules Damji <[hidden email]> wrote:
+1 (non-binding)

Sent from my iPhone
Pardon the dumb thumb typos :)

On Jun 23, 2020, at 11:36 AM, Holden Karau <[hidden email]> wrote:


+1 on a patch release soon

On Tue, Jun 23, 2020 at 10:47 AM Reynold Xin <[hidden email]> wrote:
+1 on doing a new patch release soon. I saw some of these issues when preparing the 3.0 release, and some of them are very serious.


On Tue, Jun 23, 2020 at 8:06 AM, Shivaram Venkataraman <[hidden email]> wrote:

+1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release soon.

Shivaram

On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro <[hidden email]> wrote:

Thanks for the heads-up, Yuanjian!

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.

wow, the updates are so quick. Anyway, +1 for the release.

Bests,
Takeshi

On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li <[hidden email]> wrote:

Hi dev-list,

I’m writing this to raise the discussion about Spark 3.0.1 feasibility since 4 blocker issues were found after Spark 3.0.0:

[SPARK-31990] The state store compatibility broken will cause a correctness issue when Streaming query with `dropDuplicate` uses the checkpoint written by the old Spark version.

[SPARK-32038] The regression bug in handling NaN values in COUNT(DISTINCT)

[SPARK-31918][WIP] CRAN requires to make it working with the latest R 4.0. It makes the 3.0 release unavailable on CRAN, and only supports R [3.5, 4.0)

[SPARK-31967] Downgrade vis.js to fix Jobs UI loading time regression

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. I think it would be great if we have Spark 3.0.1 to deliver the critical fixes.

Any comments are appreciated.

Best,

Yuanjian

--
---
Takeshi Yamamuro

--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]




--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache Spark 3.0.1 Release

Gengliang Wang
+1, the issues mentioned are really serious. 

On Tue, Jun 23, 2020 at 7:56 PM Hyukjin Kwon <[hidden email]> wrote:
+1.

Just as a note,
SPARK-31918 is fixed now, and there's no blocker. - When we build SparkR, we should use the latest R version at least 4.0.0+.

2020년 6월 24일 (수) 오전 11:20, Dongjoon Hyun <[hidden email]>님이 작성:
+1

Bests,
Dongjoon.

On Tue, Jun 23, 2020 at 1:19 PM Jungtaek Lim <[hidden email]> wrote:
+1 on a 3.0.1 soon.

Probably it would be nice if some Scala experts can take a look at https://issues.apache.org/jira/browse/SPARK-32051 and include the fix into 3.0.1 if possible.
Looks like APIs designed to work with Scala 2.11 & Java bring ambiguity in Scala 2.12 & Java. 

On Wed, Jun 24, 2020 at 4:52 AM Jules Damji <[hidden email]> wrote:
+1 (non-binding)

Sent from my iPhone
Pardon the dumb thumb typos :)

On Jun 23, 2020, at 11:36 AM, Holden Karau <[hidden email]> wrote:


+1 on a patch release soon

On Tue, Jun 23, 2020 at 10:47 AM Reynold Xin <[hidden email]> wrote:
+1 on doing a new patch release soon. I saw some of these issues when preparing the 3.0 release, and some of them are very serious.


On Tue, Jun 23, 2020 at 8:06 AM, Shivaram Venkataraman <[hidden email]> wrote:

+1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release soon.

Shivaram

On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro <[hidden email]> wrote:

Thanks for the heads-up, Yuanjian!

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.

wow, the updates are so quick. Anyway, +1 for the release.

Bests,
Takeshi

On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li <[hidden email]> wrote:

Hi dev-list,

I’m writing this to raise the discussion about Spark 3.0.1 feasibility since 4 blocker issues were found after Spark 3.0.0:

[SPARK-31990] The state store compatibility broken will cause a correctness issue when Streaming query with `dropDuplicate` uses the checkpoint written by the old Spark version.

[SPARK-32038] The regression bug in handling NaN values in COUNT(DISTINCT)

[SPARK-31918][WIP] CRAN requires to make it working with the latest R 4.0. It makes the 3.0 release unavailable on CRAN, and only supports R [3.5, 4.0)

[SPARK-31967] Downgrade vis.js to fix Jobs UI loading time regression

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. I think it would be great if we have Spark 3.0.1 to deliver the critical fixes.

Any comments are appreciated.

Best,

Yuanjian

--
---
Takeshi Yamamuro

--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]




--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

回复: [DISCUSS] Apache Spark 3.0.1 Release

Ruifeng Zheng
I volunteer to be a release manager of 3.0.1, if nobody is working on this.


------------------ 原始邮件 ------------------
发件人: "Gengliang Wang"<[hidden email]>;
发送时间: 2020年6月24日(星期三) 下午4:15
收件人: "Hyukjin Kwon"<[hidden email]>;
抄送: "Dongjoon Hyun"<[hidden email]>;"Jungtaek Lim"<[hidden email]>;"Jules Damji"<[hidden email]>;"Holden Karau"<[hidden email]>;"Reynold Xin"<[hidden email]>;"Shivaram Venkataraman"<[hidden email]>;"Yuanjian Li"<[hidden email]>;"Spark dev list"<[hidden email]>;"Takeshi Yamamuro"<[hidden email]>;
主题: Re: [DISCUSS] Apache Spark 3.0.1 Release

+1, the issues mentioned are really serious. 

On Tue, Jun 23, 2020 at 7:56 PM Hyukjin Kwon <[hidden email]> wrote:
+1.

Just as a note,
SPARK-31918 is fixed now, and there's no blocker. - When we build SparkR, we should use the latest R version at least 4.0.0+.

2020년 6월 24일 (수) 오전 11:20, Dongjoon Hyun <[hidden email]>님이 작성:
+1

Bests,
Dongjoon.

On Tue, Jun 23, 2020 at 1:19 PM Jungtaek Lim <[hidden email]> wrote:
+1 on a 3.0.1 soon.

Probably it would be nice if some Scala experts can take a look at https://issues.apache.org/jira/browse/SPARK-32051 and include the fix into 3.0.1 if possible.
Looks like APIs designed to work with Scala 2.11 & Java bring ambiguity in Scala 2.12 & Java. 

On Wed, Jun 24, 2020 at 4:52 AM Jules Damji <[hidden email]> wrote:
+1 (non-binding)

Sent from my iPhone
Pardon the dumb thumb typos :)

On Jun 23, 2020, at 11:36 AM, Holden Karau <[hidden email]> wrote:


+1 on a patch release soon

On Tue, Jun 23, 2020 at 10:47 AM Reynold Xin <[hidden email]> wrote:
+1 on doing a new patch release soon. I saw some of these issues when preparing the 3.0 release, and some of them are very serious.


On Tue, Jun 23, 2020 at 8:06 AM, Shivaram Venkataraman <[hidden email]> wrote:

+1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release soon.

Shivaram

On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro <[hidden email]> wrote:

Thanks for the heads-up, Yuanjian!

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.

wow, the updates are so quick. Anyway, +1 for the release.

Bests,
Takeshi

On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li <[hidden email]> wrote:

Hi dev-list,

I’m writing this to raise the discussion about Spark 3.0.1 feasibility since 4 blocker issues were found after Spark 3.0.0:

[SPARK-31990] The state store compatibility broken will cause a correctness issue when Streaming query with `dropDuplicate` uses the checkpoint written by the old Spark version.

[SPARK-32038] The regression bug in handling NaN values in COUNT(DISTINCT)

[SPARK-31918][WIP] CRAN requires to make it working with the latest R 4.0. It makes the 3.0 release unavailable on CRAN, and only supports R [3.5, 4.0)

[SPARK-31967] Downgrade vis.js to fix Jobs UI loading time regression

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. I think it would be great if we have Spark 3.0.1 to deliver the critical fixes.

Any comments are appreciated.

Best,

Yuanjian

--
---
Takeshi Yamamuro

--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]




--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache Spark 3.0.1 Release

Prashant Sharma
+1 for 3.0.1 release.
I too can help out as release manager.

On Thu, Jun 25, 2020 at 4:58 AM 郑瑞峰 <[hidden email]> wrote:
I volunteer to be a release manager of 3.0.1, if nobody is working on this.


------------------ 原始邮件 ------------------
发件人: "Gengliang Wang"<[hidden email]>;
发送时间: 2020年6月24日(星期三) 下午4:15
收件人: "Hyukjin Kwon"<[hidden email]>;
抄送: "Dongjoon Hyun"<[hidden email]>;"Jungtaek Lim"<[hidden email]>;"Jules Damji"<[hidden email]>;"Holden Karau"<[hidden email]>;"Reynold Xin"<[hidden email]>;"Shivaram Venkataraman"<[hidden email]>;"Yuanjian Li"<[hidden email]>;"Spark dev list"<[hidden email]>;"Takeshi Yamamuro"<[hidden email]>;
主题: Re: [DISCUSS] Apache Spark 3.0.1 Release

+1, the issues mentioned are really serious. 

On Tue, Jun 23, 2020 at 7:56 PM Hyukjin Kwon <[hidden email]> wrote:
+1.

Just as a note,
SPARK-31918 is fixed now, and there's no blocker. - When we build SparkR, we should use the latest R version at least 4.0.0+.

2020년 6월 24일 (수) 오전 11:20, Dongjoon Hyun <[hidden email]>님이 작성:
+1

Bests,
Dongjoon.

On Tue, Jun 23, 2020 at 1:19 PM Jungtaek Lim <[hidden email]> wrote:
+1 on a 3.0.1 soon.

Probably it would be nice if some Scala experts can take a look at https://issues.apache.org/jira/browse/SPARK-32051 and include the fix into 3.0.1 if possible.
Looks like APIs designed to work with Scala 2.11 & Java bring ambiguity in Scala 2.12 & Java. 

On Wed, Jun 24, 2020 at 4:52 AM Jules Damji <[hidden email]> wrote:
+1 (non-binding)

Sent from my iPhone
Pardon the dumb thumb typos :)

On Jun 23, 2020, at 11:36 AM, Holden Karau <[hidden email]> wrote:


+1 on a patch release soon

On Tue, Jun 23, 2020 at 10:47 AM Reynold Xin <[hidden email]> wrote:
+1 on doing a new patch release soon. I saw some of these issues when preparing the 3.0 release, and some of them are very serious.


On Tue, Jun 23, 2020 at 8:06 AM, Shivaram Venkataraman <[hidden email]> wrote:

+1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release soon.

Shivaram

On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro <[hidden email]> wrote:

Thanks for the heads-up, Yuanjian!

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.

wow, the updates are so quick. Anyway, +1 for the release.

Bests,
Takeshi

On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li <[hidden email]> wrote:

Hi dev-list,

I’m writing this to raise the discussion about Spark 3.0.1 feasibility since 4 blocker issues were found after Spark 3.0.0:

[SPARK-31990] The state store compatibility broken will cause a correctness issue when Streaming query with `dropDuplicate` uses the checkpoint written by the old Spark version.

[SPARK-32038] The regression bug in handling NaN values in COUNT(DISTINCT)

[SPARK-31918][WIP] CRAN requires to make it working with the latest R 4.0. It makes the 3.0 release unavailable on CRAN, and only supports R [3.5, 4.0)

[SPARK-31967] Downgrade vis.js to fix Jobs UI loading time regression

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. I think it would be great if we have Spark 3.0.1 to deliver the critical fixes.

Any comments are appreciated.

Best,

Yuanjian

--
---
Takeshi Yamamuro

--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]




--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache Spark 3.0.1 Release

Shivaram Venkataraman
Hi all

I just wanted to ping this thread to see if all the outstanding blockers for 3.0.1 have been fixed. If so, it would be great if we can get the release going. The CRAN team sent us a note that the version SparkR available on CRAN for the current R version (4.0.2) is broken and hence we need to update the package soon --  it will be great to do it with 3.0.1.

Thanks
Shivaram

On Wed, Jun 24, 2020 at 8:31 PM Prashant Sharma <[hidden email]> wrote:
+1 for 3.0.1 release.
I too can help out as release manager.

On Thu, Jun 25, 2020 at 4:58 AM 郑瑞峰 <[hidden email]> wrote:
I volunteer to be a release manager of 3.0.1, if nobody is working on this.


------------------ 原始邮件 ------------------
发件人: "Gengliang Wang"<[hidden email]>;
发送时间: 2020年6月24日(星期三) 下午4:15
收件人: "Hyukjin Kwon"<[hidden email]>;
抄送: "Dongjoon Hyun"<[hidden email]>;"Jungtaek Lim"<[hidden email]>;"Jules Damji"<[hidden email]>;"Holden Karau"<[hidden email]>;"Reynold Xin"<[hidden email]>;"Shivaram Venkataraman"<[hidden email]>;"Yuanjian Li"<[hidden email]>;"Spark dev list"<[hidden email]>;"Takeshi Yamamuro"<[hidden email]>;
主题: Re: [DISCUSS] Apache Spark 3.0.1 Release

+1, the issues mentioned are really serious. 

On Tue, Jun 23, 2020 at 7:56 PM Hyukjin Kwon <[hidden email]> wrote:
+1.

Just as a note,
SPARK-31918 is fixed now, and there's no blocker. - When we build SparkR, we should use the latest R version at least 4.0.0+.

2020년 6월 24일 (수) 오전 11:20, Dongjoon Hyun <[hidden email]>님이 작성:
+1

Bests,
Dongjoon.

On Tue, Jun 23, 2020 at 1:19 PM Jungtaek Lim <[hidden email]> wrote:
+1 on a 3.0.1 soon.

Probably it would be nice if some Scala experts can take a look at https://issues.apache.org/jira/browse/SPARK-32051 and include the fix into 3.0.1 if possible.
Looks like APIs designed to work with Scala 2.11 & Java bring ambiguity in Scala 2.12 & Java. 

On Wed, Jun 24, 2020 at 4:52 AM Jules Damji <[hidden email]> wrote:
+1 (non-binding)

Sent from my iPhone
Pardon the dumb thumb typos :)

On Jun 23, 2020, at 11:36 AM, Holden Karau <[hidden email]> wrote:


+1 on a patch release soon

On Tue, Jun 23, 2020 at 10:47 AM Reynold Xin <[hidden email]> wrote:
+1 on doing a new patch release soon. I saw some of these issues when preparing the 3.0 release, and some of them are very serious.


On Tue, Jun 23, 2020 at 8:06 AM, Shivaram Venkataraman <[hidden email]> wrote:

+1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release soon.

Shivaram

On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro <[hidden email]> wrote:

Thanks for the heads-up, Yuanjian!

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.

wow, the updates are so quick. Anyway, +1 for the release.

Bests,
Takeshi

On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li <[hidden email]> wrote:

Hi dev-list,

I’m writing this to raise the discussion about Spark 3.0.1 feasibility since 4 blocker issues were found after Spark 3.0.0:

[SPARK-31990] The state store compatibility broken will cause a correctness issue when Streaming query with `dropDuplicate` uses the checkpoint written by the old Spark version.

[SPARK-32038] The regression bug in handling NaN values in COUNT(DISTINCT)

[SPARK-31918][WIP] CRAN requires to make it working with the latest R 4.0. It makes the 3.0 release unavailable on CRAN, and only supports R [3.5, 4.0)

[SPARK-31967] Downgrade vis.js to fix Jobs UI loading time regression

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. I think it would be great if we have Spark 3.0.1 to deliver the critical fixes.

Any comments are appreciated.

Best,

Yuanjian

--
---
Takeshi Yamamuro

--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]




--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache Spark 3.0.1 Release

Jungtaek Lim-2
SPARK-32130 [1] looks to be a performance regression introduced in Spark 3.0.0, which is ideal to look into before releasing another bugfix version.


On Wed, Jul 1, 2020 at 7:05 AM Shivaram Venkataraman <[hidden email]> wrote:
Hi all

I just wanted to ping this thread to see if all the outstanding blockers for 3.0.1 have been fixed. If so, it would be great if we can get the release going. The CRAN team sent us a note that the version SparkR available on CRAN for the current R version (4.0.2) is broken and hence we need to update the package soon --  it will be great to do it with 3.0.1.

Thanks
Shivaram

On Wed, Jun 24, 2020 at 8:31 PM Prashant Sharma <[hidden email]> wrote:
+1 for 3.0.1 release.
I too can help out as release manager.

On Thu, Jun 25, 2020 at 4:58 AM 郑瑞峰 <[hidden email]> wrote:
I volunteer to be a release manager of 3.0.1, if nobody is working on this.


------------------ 原始邮件 ------------------
发件人: "Gengliang Wang"<[hidden email]>;
发送时间: 2020年6月24日(星期三) 下午4:15
收件人: "Hyukjin Kwon"<[hidden email]>;
抄送: "Dongjoon Hyun"<[hidden email]>;"Jungtaek Lim"<[hidden email]>;"Jules Damji"<[hidden email]>;"Holden Karau"<[hidden email]>;"Reynold Xin"<[hidden email]>;"Shivaram Venkataraman"<[hidden email]>;"Yuanjian Li"<[hidden email]>;"Spark dev list"<[hidden email]>;"Takeshi Yamamuro"<[hidden email]>;
主题: Re: [DISCUSS] Apache Spark 3.0.1 Release

+1, the issues mentioned are really serious. 

On Tue, Jun 23, 2020 at 7:56 PM Hyukjin Kwon <[hidden email]> wrote:
+1.

Just as a note,
SPARK-31918 is fixed now, and there's no blocker. - When we build SparkR, we should use the latest R version at least 4.0.0+.

2020년 6월 24일 (수) 오전 11:20, Dongjoon Hyun <[hidden email]>님이 작성:
+1

Bests,
Dongjoon.

On Tue, Jun 23, 2020 at 1:19 PM Jungtaek Lim <[hidden email]> wrote:
+1 on a 3.0.1 soon.

Probably it would be nice if some Scala experts can take a look at https://issues.apache.org/jira/browse/SPARK-32051 and include the fix into 3.0.1 if possible.
Looks like APIs designed to work with Scala 2.11 & Java bring ambiguity in Scala 2.12 & Java. 

On Wed, Jun 24, 2020 at 4:52 AM Jules Damji <[hidden email]> wrote:
+1 (non-binding)

Sent from my iPhone
Pardon the dumb thumb typos :)

On Jun 23, 2020, at 11:36 AM, Holden Karau <[hidden email]> wrote:


+1 on a patch release soon

On Tue, Jun 23, 2020 at 10:47 AM Reynold Xin <[hidden email]> wrote:
+1 on doing a new patch release soon. I saw some of these issues when preparing the 3.0 release, and some of them are very serious.


On Tue, Jun 23, 2020 at 8:06 AM, Shivaram Venkataraman <[hidden email]> wrote:

+1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release soon.

Shivaram

On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro <[hidden email]> wrote:

Thanks for the heads-up, Yuanjian!

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.

wow, the updates are so quick. Anyway, +1 for the release.

Bests,
Takeshi

On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li <[hidden email]> wrote:

Hi dev-list,

I’m writing this to raise the discussion about Spark 3.0.1 feasibility since 4 blocker issues were found after Spark 3.0.0:

[SPARK-31990] The state store compatibility broken will cause a correctness issue when Streaming query with `dropDuplicate` uses the checkpoint written by the old Spark version.

[SPARK-32038] The regression bug in handling NaN values in COUNT(DISTINCT)

[SPARK-31918][WIP] CRAN requires to make it working with the latest R 4.0. It makes the 3.0 release unavailable on CRAN, and only supports R [3.5, 4.0)

[SPARK-31967] Downgrade vis.js to fix Jobs UI loading time regression

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. I think it would be great if we have Spark 3.0.1 to deliver the critical fixes.

Any comments are appreciated.

Best,

Yuanjian

--
---
Takeshi Yamamuro

--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]




--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache Spark 3.0.1 Release

jm-2

Hi all,

 

Could I get some input on the severity of this one that I found yesterday?  If that’s a correctness issue, should it block this patch?  Let me know under the ticket if there’s more info that I can provide to help.

 

https://issues.apache.org/jira/browse/SPARK-32136

 

Thanks,

Jason.

 

From: Jungtaek Lim <[hidden email]>
Date: Wednesday, 1 July 2020 at 10:20 am
To: Shivaram Venkataraman <[hidden email]>
Cc: Prashant Sharma <[hidden email]>,
瑞峰 <[hidden email]>, Gengliang Wang <[hidden email]>, gurwls223 <[hidden email]>, Dongjoon Hyun <[hidden email]>, Jules Damji <[hidden email]>, Holden Karau <[hidden email]>, Reynold Xin <[hidden email]>, Yuanjian Li <[hidden email]>, "[hidden email]" <[hidden email]>, Takeshi Yamamuro <[hidden email]>
Subject: Re: [DISCUSS] Apache Spark 3.0.1 Release

 

SPARK-32130 [1] looks to be a performance regression introduced in Spark 3.0.0, which is ideal to look into before releasing another bugfix version.

 

 

On Wed, Jul 1, 2020 at 7:05 AM Shivaram Venkataraman <[hidden email]> wrote:

Hi all

 

I just wanted to ping this thread to see if all the outstanding blockers for 3.0.1 have been fixed. If so, it would be great if we can get the release going. The CRAN team sent us a note that the version SparkR available on CRAN for the current R version (4.0.2) is broken and hence we need to update the package soon --  it will be great to do it with 3.0.1.

 

Thanks

Shivaram

 

On Wed, Jun 24, 2020 at 8:31 PM Prashant Sharma <[hidden email]> wrote:

+1 for 3.0.1 release.

I too can help out as release manager.

 

On Thu, Jun 25, 2020 at 4:58 AM 瑞峰 <[hidden email]> wrote:

I volunteer to be a release manager of 3.0.1, if nobody is working on this.

 

 

------------------ 原始 ------------------

件人: "Gengliang Wang"<[hidden email]>;

时间: 2020624(星期三) 下午4:15

收件人: "Hyukjin Kwon"<[hidden email]>;

抄送: "Dongjoon Hyun"<[hidden email]>;"Jungtaek Lim"<[hidden email]>;"Jules Damji"<[hidden email]>;"Holden Karau"<[hidden email]>;"Reynold Xin"<[hidden email]>;"Shivaram Venkataraman"<[hidden email]>;"Yuanjian Li"<[hidden email]>;"Spark dev list"<[hidden email]>;"Takeshi Yamamuro"<[hidden email]>;

: Re: [DISCUSS] Apache Spark 3.0.1 Release

 

+1, the issues mentioned are really serious. 

 

On Tue, Jun 23, 2020 at 7:56 PM Hyukjin Kwon <[hidden email]> wrote:

+1.

Just as a note,
SPARK-31918 is fixed now, and there's no blocker. - When we build SparkR, we should use the latest R version at least 4.0.0+.

 

2020 6 24 () 오전 11:20, Dongjoon Hyun <[hidden email]>님이 작성:

+1

 

Bests,

Dongjoon.

 

On Tue, Jun 23, 2020 at 1:19 PM Jungtaek Lim <[hidden email]> wrote:

+1 on a 3.0.1 soon.

 

Probably it would be nice if some Scala experts can take a look at https://issues.apache.org/jira/browse/SPARK-32051 and include the fix into 3.0.1 if possible.

Looks like APIs designed to work with Scala 2.11 & Java bring ambiguity in Scala 2.12 & Java. 

 

On Wed, Jun 24, 2020 at 4:52 AM Jules Damji <[hidden email]> wrote:

+1 (non-binding)

 

Sent from my iPhone

Pardon the dumb thumb typos :)



On Jun 23, 2020, at 11:36 AM, Holden Karau <[hidden email]> wrote:

+1 on a patch release soon

 

On Tue, Jun 23, 2020 at 10:47 AM Reynold Xin <[hidden email]> wrote:

Error! Filename not specified.

+1 on doing a new patch release soon. I saw some of these issues when preparing the 3.0 release, and some of them are very serious.

 

 

On Tue, Jun 23, 2020 at 8:06 AM, Shivaram Venkataraman <[hidden email]> wrote:

+1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release soon.

Shivaram

On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro <[hidden email]> wrote:

Thanks for the heads-up, Yuanjian!

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.

wow, the updates are so quick. Anyway, +1 for the release.

Bests,
Takeshi

On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li <[hidden email]> wrote:

Hi dev-list,

I’m writing this to raise the discussion about Spark 3.0.1 feasibility since 4 blocker issues were found after Spark 3.0.0:

[SPARK-31990] The state store compatibility broken will cause a correctness issue when Streaming query with `dropDuplicate` uses the checkpoint written by the old Spark version.

[SPARK-32038] The regression bug in handling NaN values in COUNT(DISTINCT)

[SPARK-31918][WIP] CRAN requires to make it working with the latest R 4.0. It makes the 3.0 release unavailable on CRAN, and only supports R [3.5, 4.0)

[SPARK-31967] Downgrade vis.js to fix Jobs UI loading time regression

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. I think it would be great if we have Spark 3.0.1 to deliver the critical fixes.

Any comments are appreciated.

Best,

Yuanjian

--
---
Takeshi Yamamuro

--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]

 


 

--

Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache Spark 3.0.1 Release

Holden Karau
I can take care of 2.4.7 unless someone else wants to do it.

On Tue, Jun 30, 2020 at 8:29 PM Jason Moore <[hidden email]> wrote:

Hi all,

 

Could I get some input on the severity of this one that I found yesterday?  If that’s a correctness issue, should it block this patch?  Let me know under the ticket if there’s more info that I can provide to help.

 

https://issues.apache.org/jira/browse/SPARK-32136

 

Thanks,

Jason.

 

From: Jungtaek Lim <[hidden email]>
Date: Wednesday, 1 July 2020 at 10:20 am
To: Shivaram Venkataraman <[hidden email]>
Cc: Prashant Sharma <[hidden email]>,
瑞峰 <[hidden email]>, Gengliang Wang <[hidden email]>, gurwls223 <[hidden email]>, Dongjoon Hyun <[hidden email]>, Jules Damji <[hidden email]>, Holden Karau <[hidden email]>, Reynold Xin <[hidden email]>, Yuanjian Li <[hidden email]>, "[hidden email]" <[hidden email]>, Takeshi Yamamuro <[hidden email]>
Subject: Re: [DISCUSS] Apache Spark 3.0.1 Release

 

SPARK-32130 [1] looks to be a performance regression introduced in Spark 3.0.0, which is ideal to look into before releasing another bugfix version.

 

 

On Wed, Jul 1, 2020 at 7:05 AM Shivaram Venkataraman <[hidden email]> wrote:

Hi all

 

I just wanted to ping this thread to see if all the outstanding blockers for 3.0.1 have been fixed. If so, it would be great if we can get the release going. The CRAN team sent us a note that the version SparkR available on CRAN for the current R version (4.0.2) is broken and hence we need to update the package soon --  it will be great to do it with 3.0.1.

 

Thanks

Shivaram

 

On Wed, Jun 24, 2020 at 8:31 PM Prashant Sharma <[hidden email]> wrote:

+1 for 3.0.1 release.

I too can help out as release manager.

 

On Thu, Jun 25, 2020 at 4:58 AM 瑞峰 <[hidden email]> wrote:

I volunteer to be a release manager of 3.0.1, if nobody is working on this.

 

 

------------------ 原始 ------------------

件人: "Gengliang Wang"<[hidden email]>;

时间: 2020624(星期三) 下午4:15

收件人: "Hyukjin Kwon"<[hidden email]>;

抄送: "Dongjoon Hyun"<[hidden email]>;"Jungtaek Lim"<[hidden email]>;"Jules Damji"<[hidden email]>;"Holden Karau"<[hidden email]>;"Reynold Xin"<[hidden email]>;"Shivaram Venkataraman"<[hidden email]>;"Yuanjian Li"<[hidden email]>;"Spark dev list"<[hidden email]>;"Takeshi Yamamuro"<[hidden email]>;

: Re: [DISCUSS] Apache Spark 3.0.1 Release

 

+1, the issues mentioned are really serious. 

 

On Tue, Jun 23, 2020 at 7:56 PM Hyukjin Kwon <[hidden email]> wrote:

+1.

Just as a note,
SPARK-31918 is fixed now, and there's no blocker. - When we build SparkR, we should use the latest R version at least 4.0.0+.

 

2020 6 24 () 오전 11:20, Dongjoon Hyun <[hidden email]>님이 작성:

+1

 

Bests,

Dongjoon.

 

On Tue, Jun 23, 2020 at 1:19 PM Jungtaek Lim <[hidden email]> wrote:

+1 on a 3.0.1 soon.

 

Probably it would be nice if some Scala experts can take a look at https://issues.apache.org/jira/browse/SPARK-32051 and include the fix into 3.0.1 if possible.

Looks like APIs designed to work with Scala 2.11 & Java bring ambiguity in Scala 2.12 & Java. 

 

On Wed, Jun 24, 2020 at 4:52 AM Jules Damji <[hidden email]> wrote:

+1 (non-binding)

 

Sent from my iPhone

Pardon the dumb thumb typos :)



On Jun 23, 2020, at 11:36 AM, Holden Karau <[hidden email]> wrote:

+1 on a patch release soon

 

On Tue, Jun 23, 2020 at 10:47 AM Reynold Xin <[hidden email]> wrote:

Error! Filename not specified.

+1 on doing a new patch release soon. I saw some of these issues when preparing the 3.0 release, and some of them are very serious.

 

 

On Tue, Jun 23, 2020 at 8:06 AM, Shivaram Venkataraman <[hidden email]> wrote:

+1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release soon.

Shivaram

On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro <[hidden email]> wrote:

Thanks for the heads-up, Yuanjian!

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.

wow, the updates are so quick. Anyway, +1 for the release.

Bests,
Takeshi

On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li <[hidden email]> wrote:

Hi dev-list,

I’m writing this to raise the discussion about Spark 3.0.1 feasibility since 4 blocker issues were found after Spark 3.0.0:

[SPARK-31990] The state store compatibility broken will cause a correctness issue when Streaming query with `dropDuplicate` uses the checkpoint written by the old Spark version.

[SPARK-32038] The regression bug in handling NaN values in COUNT(DISTINCT)

[SPARK-31918][WIP] CRAN requires to make it working with the latest R 4.0. It makes the 3.0 release unavailable on CRAN, and only supports R [3.5, 4.0)

[SPARK-31967] Downgrade vis.js to fix Jobs UI loading time regression

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. I think it would be great if we have Spark 3.0.1 to deliver the critical fixes.

Any comments are appreciated.

Best,

Yuanjian

--
---
Takeshi Yamamuro

--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]

 


 

--

Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 

--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache Spark 3.0.1 Release

cloud0fan
Hi Jason,

Thanks for reporting! https://issues.apache.org/jira/browse/SPARK-32136 looks like a breaking change and we should investigate.

On Wed, Jul 1, 2020 at 11:31 AM Holden Karau <[hidden email]> wrote:
I can take care of 2.4.7 unless someone else wants to do it.

On Tue, Jun 30, 2020 at 8:29 PM Jason Moore <[hidden email]> wrote:

Hi all,

 

Could I get some input on the severity of this one that I found yesterday?  If that’s a correctness issue, should it block this patch?  Let me know under the ticket if there’s more info that I can provide to help.

 

https://issues.apache.org/jira/browse/SPARK-32136

 

Thanks,

Jason.

 

From: Jungtaek Lim <[hidden email]>
Date: Wednesday, 1 July 2020 at 10:20 am
To: Shivaram Venkataraman <[hidden email]>
Cc: Prashant Sharma <[hidden email]>,
瑞峰 <[hidden email]>, Gengliang Wang <[hidden email]>, gurwls223 <[hidden email]>, Dongjoon Hyun <[hidden email]>, Jules Damji <[hidden email]>, Holden Karau <[hidden email]>, Reynold Xin <[hidden email]>, Yuanjian Li <[hidden email]>, "[hidden email]" <[hidden email]>, Takeshi Yamamuro <[hidden email]>
Subject: Re: [DISCUSS] Apache Spark 3.0.1 Release

 

SPARK-32130 [1] looks to be a performance regression introduced in Spark 3.0.0, which is ideal to look into before releasing another bugfix version.

 

 

On Wed, Jul 1, 2020 at 7:05 AM Shivaram Venkataraman <[hidden email]> wrote:

Hi all

 

I just wanted to ping this thread to see if all the outstanding blockers for 3.0.1 have been fixed. If so, it would be great if we can get the release going. The CRAN team sent us a note that the version SparkR available on CRAN for the current R version (4.0.2) is broken and hence we need to update the package soon --  it will be great to do it with 3.0.1.

 

Thanks

Shivaram

 

On Wed, Jun 24, 2020 at 8:31 PM Prashant Sharma <[hidden email]> wrote:

+1 for 3.0.1 release.

I too can help out as release manager.

 

On Thu, Jun 25, 2020 at 4:58 AM 瑞峰 <[hidden email]> wrote:

I volunteer to be a release manager of 3.0.1, if nobody is working on this.

 

 

------------------ 原始 ------------------

件人: "Gengliang Wang"<[hidden email]>;

时间: 2020624(星期三) 下午4:15

收件人: "Hyukjin Kwon"<[hidden email]>;

抄送: "Dongjoon Hyun"<[hidden email]>;"Jungtaek Lim"<[hidden email]>;"Jules Damji"<[hidden email]>;"Holden Karau"<[hidden email]>;"Reynold Xin"<[hidden email]>;"Shivaram Venkataraman"<[hidden email]>;"Yuanjian Li"<[hidden email]>;"Spark dev list"<[hidden email]>;"Takeshi Yamamuro"<[hidden email]>;

: Re: [DISCUSS] Apache Spark 3.0.1 Release

 

+1, the issues mentioned are really serious. 

 

On Tue, Jun 23, 2020 at 7:56 PM Hyukjin Kwon <[hidden email]> wrote:

+1.

Just as a note,
SPARK-31918 is fixed now, and there's no blocker. - When we build SparkR, we should use the latest R version at least 4.0.0+.

 

2020 6 24 () 오전 11:20, Dongjoon Hyun <[hidden email]>님이 작성:

+1

 

Bests,

Dongjoon.

 

On Tue, Jun 23, 2020 at 1:19 PM Jungtaek Lim <[hidden email]> wrote:

+1 on a 3.0.1 soon.

 

Probably it would be nice if some Scala experts can take a look at https://issues.apache.org/jira/browse/SPARK-32051 and include the fix into 3.0.1 if possible.

Looks like APIs designed to work with Scala 2.11 & Java bring ambiguity in Scala 2.12 & Java. 

 

On Wed, Jun 24, 2020 at 4:52 AM Jules Damji <[hidden email]> wrote:

+1 (non-binding)

 

Sent from my iPhone

Pardon the dumb thumb typos :)



On Jun 23, 2020, at 11:36 AM, Holden Karau <[hidden email]> wrote:

+1 on a patch release soon

 

On Tue, Jun 23, 2020 at 10:47 AM Reynold Xin <[hidden email]> wrote:

Error! Filename not specified.

+1 on doing a new patch release soon. I saw some of these issues when preparing the 3.0 release, and some of them are very serious.

 

 

On Tue, Jun 23, 2020 at 8:06 AM, Shivaram Venkataraman <[hidden email]> wrote:

+1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release soon.

Shivaram

On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro <[hidden email]> wrote:

Thanks for the heads-up, Yuanjian!

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.

wow, the updates are so quick. Anyway, +1 for the release.

Bests,
Takeshi

On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li <[hidden email]> wrote:

Hi dev-list,

I’m writing this to raise the discussion about Spark 3.0.1 feasibility since 4 blocker issues were found after Spark 3.0.0:

[SPARK-31990] The state store compatibility broken will cause a correctness issue when Streaming query with `dropDuplicate` uses the checkpoint written by the old Spark version.

[SPARK-32038] The regression bug in handling NaN values in COUNT(DISTINCT)

[SPARK-31918][WIP] CRAN requires to make it working with the latest R 4.0. It makes the 3.0 release unavailable on CRAN, and only supports R [3.5, 4.0)

[SPARK-31967] Downgrade vis.js to fix Jobs UI loading time regression

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. I think it would be great if we have Spark 3.0.1 to deliver the critical fixes.

Any comments are appreciated.

Best,

Yuanjian

--
---
Takeshi Yamamuro

--------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]

 


 

--

Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 

--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache Spark 3.0.1 Release

Shivaram Venkataraman
In reply to this post by Holden Karau
Thanks Holden -- it would be great to also get 2.4.7 started

Thanks
Shivaram

On Tue, Jun 30, 2020 at 10:31 PM Holden Karau <[hidden email]> wrote:

>
> I can take care of 2.4.7 unless someone else wants to do it.
>
> On Tue, Jun 30, 2020 at 8:29 PM Jason Moore <[hidden email]> wrote:
>>
>> Hi all,
>>
>>
>>
>> Could I get some input on the severity of this one that I found yesterday?  If that’s a correctness issue, should it block this patch?  Let me know under the ticket if there’s more info that I can provide to help.
>>
>>
>>
>> https://issues.apache.org/jira/browse/SPARK-32136
>>
>>
>>
>> Thanks,
>>
>> Jason.
>>
>>
>>
>> From: Jungtaek Lim <[hidden email]>
>> Date: Wednesday, 1 July 2020 at 10:20 am
>> To: Shivaram Venkataraman <[hidden email]>
>> Cc: Prashant Sharma <[hidden email]>, 郑瑞峰 <[hidden email]>, Gengliang Wang <[hidden email]>, gurwls223 <[hidden email]>, Dongjoon Hyun <[hidden email]>, Jules Damji <[hidden email]>, Holden Karau <[hidden email]>, Reynold Xin <[hidden email]>, Yuanjian Li <[hidden email]>, "[hidden email]" <[hidden email]>, Takeshi Yamamuro <[hidden email]>
>> Subject: Re: [DISCUSS] Apache Spark 3.0.1 Release
>>
>>
>>
>> SPARK-32130 [1] looks to be a performance regression introduced in Spark 3.0.0, which is ideal to look into before releasing another bugfix version.
>>
>>
>>
>> 1. https://issues.apache.org/jira/browse/SPARK-32130
>>
>>
>>
>> On Wed, Jul 1, 2020 at 7:05 AM Shivaram Venkataraman <[hidden email]> wrote:
>>
>> Hi all
>>
>>
>>
>> I just wanted to ping this thread to see if all the outstanding blockers for 3.0.1 have been fixed. If so, it would be great if we can get the release going. The CRAN team sent us a note that the version SparkR available on CRAN for the current R version (4.0.2) is broken and hence we need to update the package soon --  it will be great to do it with 3.0.1.
>>
>>
>>
>> Thanks
>>
>> Shivaram
>>
>>
>>
>> On Wed, Jun 24, 2020 at 8:31 PM Prashant Sharma <[hidden email]> wrote:
>>
>> +1 for 3.0.1 release.
>>
>> I too can help out as release manager.
>>
>>
>>
>> On Thu, Jun 25, 2020 at 4:58 AM 郑瑞峰 <[hidden email]> wrote:
>>
>> I volunteer to be a release manager of 3.0.1, if nobody is working on this.
>>
>>
>>
>>
>>
>> ------------------ 原始邮件 ------------------
>>
>> 发件人: "Gengliang Wang"<[hidden email]>;
>>
>> 发送时间: 2020年6月24日(星期三) 下午4:15
>>
>> 收件人: "Hyukjin Kwon"<[hidden email]>;
>>
>> 抄送: "Dongjoon Hyun"<[hidden email]>;"Jungtaek Lim"<[hidden email]>;"Jules Damji"<[hidden email]>;"Holden Karau"<[hidden email]>;"Reynold Xin"<[hidden email]>;"Shivaram Venkataraman"<[hidden email]>;"Yuanjian Li"<[hidden email]>;"Spark dev list"<[hidden email]>;"Takeshi Yamamuro"<[hidden email]>;
>>
>> 主题: Re: [DISCUSS] Apache Spark 3.0.1 Release
>>
>>
>>
>> +1, the issues mentioned are really serious.
>>
>>
>>
>> On Tue, Jun 23, 2020 at 7:56 PM Hyukjin Kwon <[hidden email]> wrote:
>>
>> +1.
>>
>> Just as a note,
>> - SPARK-31918 is fixed now, and there's no blocker. - When we build SparkR, we should use the latest R version at least 4.0.0+.
>>
>>
>>
>> 2020년 6월 24일 (수) 오전 11:20, Dongjoon Hyun <[hidden email]>님이 작성:
>>
>> +1
>>
>>
>>
>> Bests,
>>
>> Dongjoon.
>>
>>
>>
>> On Tue, Jun 23, 2020 at 1:19 PM Jungtaek Lim <[hidden email]> wrote:
>>
>> +1 on a 3.0.1 soon.
>>
>>
>>
>> Probably it would be nice if some Scala experts can take a look at https://issues.apache.org/jira/browse/SPARK-32051 and include the fix into 3.0.1 if possible.
>>
>> Looks like APIs designed to work with Scala 2.11 & Java bring ambiguity in Scala 2.12 & Java.
>>
>>
>>
>> On Wed, Jun 24, 2020 at 4:52 AM Jules Damji <[hidden email]> wrote:
>>
>> +1 (non-binding)
>>
>>
>>
>> Sent from my iPhone
>>
>> Pardon the dumb thumb typos :)
>>
>>
>>
>> On Jun 23, 2020, at 11:36 AM, Holden Karau <[hidden email]> wrote:
>>
>> +1 on a patch release soon
>>
>>
>>
>> On Tue, Jun 23, 2020 at 10:47 AM Reynold Xin <[hidden email]> wrote:
>>
>> Error! Filename not specified.
>>
>> +1 on doing a new patch release soon. I saw some of these issues when preparing the 3.0 release, and some of them are very serious.
>>
>>
>>
>>
>>
>> On Tue, Jun 23, 2020 at 8:06 AM, Shivaram Venkataraman <[hidden email]> wrote:
>>
>> +1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release soon.
>>
>> Shivaram
>>
>> On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro <[hidden email]> wrote:
>>
>> Thanks for the heads-up, Yuanjian!
>>
>> I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.
>>
>> wow, the updates are so quick. Anyway, +1 for the release.
>>
>> Bests,
>> Takeshi
>>
>> On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li <[hidden email]> wrote:
>>
>> Hi dev-list,
>>
>> I’m writing this to raise the discussion about Spark 3.0.1 feasibility since 4 blocker issues were found after Spark 3.0.0:
>>
>> [SPARK-31990] The state store compatibility broken will cause a correctness issue when Streaming query with `dropDuplicate` uses the checkpoint written by the old Spark version.
>>
>> [SPARK-32038] The regression bug in handling NaN values in COUNT(DISTINCT)
>>
>> [SPARK-31918][WIP] CRAN requires to make it working with the latest R 4.0. It makes the 3.0 release unavailable on CRAN, and only supports R [3.5, 4.0)
>>
>> [SPARK-31967] Downgrade vis.js to fix Jobs UI loading time regression
>>
>> I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. I think it would be great if we have Spark 3.0.1 to deliver the critical fixes.
>>
>> Any comments are appreciated.
>>
>> Best,
>>
>> Yuanjian
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>> --------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]
>>
>>
>>
>>
>>
>>
>> --
>>
>> Twitter: https://twitter.com/holdenkarau
>>
>> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache Spark 3.0.1 Release

Jungtaek Lim-2
https://issues.apache.org/jira/browse/SPARK-32148 was reported yesterday, and if the report is valid it looks to be a blocker. I'll try to take a look sooner.

On Thu, Jul 2, 2020 at 12:48 AM Shivaram Venkataraman <[hidden email]> wrote:
Thanks Holden -- it would be great to also get 2.4.7 started

Thanks
Shivaram

On Tue, Jun 30, 2020 at 10:31 PM Holden Karau <[hidden email]> wrote:
>
> I can take care of 2.4.7 unless someone else wants to do it.
>
> On Tue, Jun 30, 2020 at 8:29 PM Jason Moore <[hidden email]> wrote:
>>
>> Hi all,
>>
>>
>>
>> Could I get some input on the severity of this one that I found yesterday?  If that’s a correctness issue, should it block this patch?  Let me know under the ticket if there’s more info that I can provide to help.
>>
>>
>>
>> https://issues.apache.org/jira/browse/SPARK-32136
>>
>>
>>
>> Thanks,
>>
>> Jason.
>>
>>
>>
>> From: Jungtaek Lim <[hidden email]>
>> Date: Wednesday, 1 July 2020 at 10:20 am
>> To: Shivaram Venkataraman <[hidden email]>
>> Cc: Prashant Sharma <[hidden email]>, 郑瑞峰 <[hidden email]>, Gengliang Wang <[hidden email]>, gurwls223 <[hidden email]>, Dongjoon Hyun <[hidden email]>, Jules Damji <[hidden email]>, Holden Karau <[hidden email]>, Reynold Xin <[hidden email]>, Yuanjian Li <[hidden email]>, "[hidden email]" <[hidden email]>, Takeshi Yamamuro <[hidden email]>
>> Subject: Re: [DISCUSS] Apache Spark 3.0.1 Release
>>
>>
>>
>> SPARK-32130 [1] looks to be a performance regression introduced in Spark 3.0.0, which is ideal to look into before releasing another bugfix version.
>>
>>
>>
>> 1. https://issues.apache.org/jira/browse/SPARK-32130
>>
>>
>>
>> On Wed, Jul 1, 2020 at 7:05 AM Shivaram Venkataraman <[hidden email]> wrote:
>>
>> Hi all
>>
>>
>>
>> I just wanted to ping this thread to see if all the outstanding blockers for 3.0.1 have been fixed. If so, it would be great if we can get the release going. The CRAN team sent us a note that the version SparkR available on CRAN for the current R version (4.0.2) is broken and hence we need to update the package soon --  it will be great to do it with 3.0.1.
>>
>>
>>
>> Thanks
>>
>> Shivaram
>>
>>
>>
>> On Wed, Jun 24, 2020 at 8:31 PM Prashant Sharma <[hidden email]> wrote:
>>
>> +1 for 3.0.1 release.
>>
>> I too can help out as release manager.
>>
>>
>>
>> On Thu, Jun 25, 2020 at 4:58 AM 郑瑞峰 <[hidden email]> wrote:
>>
>> I volunteer to be a release manager of 3.0.1, if nobody is working on this.
>>
>>
>>
>>
>>
>> ------------------ 原始邮件 ------------------
>>
>> 发件人: "Gengliang Wang"<[hidden email]>;
>>
>> 发送时间: 2020年6月24日(星期三) 下午4:15
>>
>> 收件人: "Hyukjin Kwon"<[hidden email]>;
>>
>> 抄送: "Dongjoon Hyun"<[hidden email]>;"Jungtaek Lim"<[hidden email]>;"Jules Damji"<[hidden email]>;"Holden Karau"<[hidden email]>;"Reynold Xin"<[hidden email]>;"Shivaram Venkataraman"<[hidden email]>;"Yuanjian Li"<[hidden email]>;"Spark dev list"<[hidden email]>;"Takeshi Yamamuro"<[hidden email]>;
>>
>> 主题: Re: [DISCUSS] Apache Spark 3.0.1 Release
>>
>>
>>
>> +1, the issues mentioned are really serious.
>>
>>
>>
>> On Tue, Jun 23, 2020 at 7:56 PM Hyukjin Kwon <[hidden email]> wrote:
>>
>> +1.
>>
>> Just as a note,
>> - SPARK-31918 is fixed now, and there's no blocker. - When we build SparkR, we should use the latest R version at least 4.0.0+.
>>
>>
>>
>> 2020년 6월 24일 (수) 오전 11:20, Dongjoon Hyun <[hidden email]>님이 작성:
>>
>> +1
>>
>>
>>
>> Bests,
>>
>> Dongjoon.
>>
>>
>>
>> On Tue, Jun 23, 2020 at 1:19 PM Jungtaek Lim <[hidden email]> wrote:
>>
>> +1 on a 3.0.1 soon.
>>
>>
>>
>> Probably it would be nice if some Scala experts can take a look at https://issues.apache.org/jira/browse/SPARK-32051 and include the fix into 3.0.1 if possible.
>>
>> Looks like APIs designed to work with Scala 2.11 & Java bring ambiguity in Scala 2.12 & Java.
>>
>>
>>
>> On Wed, Jun 24, 2020 at 4:52 AM Jules Damji <[hidden email]> wrote:
>>
>> +1 (non-binding)
>>
>>
>>
>> Sent from my iPhone
>>
>> Pardon the dumb thumb typos :)
>>
>>
>>
>> On Jun 23, 2020, at 11:36 AM, Holden Karau <[hidden email]> wrote:
>>
>> +1 on a patch release soon
>>
>>
>>
>> On Tue, Jun 23, 2020 at 10:47 AM Reynold Xin <[hidden email]> wrote:
>>
>> Error! Filename not specified.
>>
>> +1 on doing a new patch release soon. I saw some of these issues when preparing the 3.0 release, and some of them are very serious.
>>
>>
>>
>>
>>
>> On Tue, Jun 23, 2020 at 8:06 AM, Shivaram Venkataraman <[hidden email]> wrote:
>>
>> +1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release soon.
>>
>> Shivaram
>>
>> On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro <[hidden email]> wrote:
>>
>> Thanks for the heads-up, Yuanjian!
>>
>> I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.
>>
>> wow, the updates are so quick. Anyway, +1 for the release.
>>
>> Bests,
>> Takeshi
>>
>> On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li <[hidden email]> wrote:
>>
>> Hi dev-list,
>>
>> I’m writing this to raise the discussion about Spark 3.0.1 feasibility since 4 blocker issues were found after Spark 3.0.0:
>>
>> [SPARK-31990] The state store compatibility broken will cause a correctness issue when Streaming query with `dropDuplicate` uses the checkpoint written by the old Spark version.
>>
>> [SPARK-32038] The regression bug in handling NaN values in COUNT(DISTINCT)
>>
>> [SPARK-31918][WIP] CRAN requires to make it working with the latest R 4.0. It makes the 3.0 release unavailable on CRAN, and only supports R [3.5, 4.0)
>>
>> [SPARK-31967] Downgrade vis.js to fix Jobs UI loading time regression
>>
>> I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. I think it would be great if we have Spark 3.0.1 to deliver the critical fixes.
>>
>> Any comments are appreciated.
>>
>> Best,
>>
>> Yuanjian
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>> --------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]
>>
>>
>>
>>
>>
>>
>> --
>>
>> Twitter: https://twitter.com/holdenkarau
>>
>> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache Spark 3.0.1 Release

Xiao Li-2
+1 on releasing both 3.0.1 and 2.4.7

Great! Three committers volunteer to be a release manager. Ruifeng, Prashant and Holden. Holden just helped release Spark 2.4.6. This time, maybe, Ruifeng and Prashant can be the release manager of 3.0.1 and 2.4.7 respectively. 

Xiao

On Wed, Jul 1, 2020 at 2:24 PM Jungtaek Lim <[hidden email]> wrote:
https://issues.apache.org/jira/browse/SPARK-32148 was reported yesterday, and if the report is valid it looks to be a blocker. I'll try to take a look sooner.

On Thu, Jul 2, 2020 at 12:48 AM Shivaram Venkataraman <[hidden email]> wrote:
Thanks Holden -- it would be great to also get 2.4.7 started

Thanks
Shivaram

On Tue, Jun 30, 2020 at 10:31 PM Holden Karau <[hidden email]> wrote:
>
> I can take care of 2.4.7 unless someone else wants to do it.
>
> On Tue, Jun 30, 2020 at 8:29 PM Jason Moore <[hidden email]> wrote:
>>
>> Hi all,
>>
>>
>>
>> Could I get some input on the severity of this one that I found yesterday?  If that’s a correctness issue, should it block this patch?  Let me know under the ticket if there’s more info that I can provide to help.
>>
>>
>>
>> https://issues.apache.org/jira/browse/SPARK-32136
>>
>>
>>
>> Thanks,
>>
>> Jason.
>>
>>
>>
>> From: Jungtaek Lim <[hidden email]>
>> Date: Wednesday, 1 July 2020 at 10:20 am
>> To: Shivaram Venkataraman <[hidden email]>
>> Cc: Prashant Sharma <[hidden email]>, 郑瑞峰 <[hidden email]>, Gengliang Wang <[hidden email]>, gurwls223 <[hidden email]>, Dongjoon Hyun <[hidden email]>, Jules Damji <[hidden email]>, Holden Karau <[hidden email]>, Reynold Xin <[hidden email]>, Yuanjian Li <[hidden email]>, "[hidden email]" <[hidden email]>, Takeshi Yamamuro <[hidden email]>
>> Subject: Re: [DISCUSS] Apache Spark 3.0.1 Release
>>
>>
>>
>> SPARK-32130 [1] looks to be a performance regression introduced in Spark 3.0.0, which is ideal to look into before releasing another bugfix version.
>>
>>
>>
>> 1. https://issues.apache.org/jira/browse/SPARK-32130
>>
>>
>>
>> On Wed, Jul 1, 2020 at 7:05 AM Shivaram Venkataraman <[hidden email]> wrote:
>>
>> Hi all
>>
>>
>>
>> I just wanted to ping this thread to see if all the outstanding blockers for 3.0.1 have been fixed. If so, it would be great if we can get the release going. The CRAN team sent us a note that the version SparkR available on CRAN for the current R version (4.0.2) is broken and hence we need to update the package soon --  it will be great to do it with 3.0.1.
>>
>>
>>
>> Thanks
>>
>> Shivaram
>>
>>
>>
>> On Wed, Jun 24, 2020 at 8:31 PM Prashant Sharma <[hidden email]> wrote:
>>
>> +1 for 3.0.1 release.
>>
>> I too can help out as release manager.
>>
>>
>>
>> On Thu, Jun 25, 2020 at 4:58 AM 郑瑞峰 <[hidden email]> wrote:
>>
>> I volunteer to be a release manager of 3.0.1, if nobody is working on this.
>>
>>
>>
>>
>>
>> ------------------ 原始邮件 ------------------
>>
>> 发件人: "Gengliang Wang"<[hidden email]>;
>>
>> 发送时间: 2020年6月24日(星期三) 下午4:15
>>
>> 收件人: "Hyukjin Kwon"<[hidden email]>;
>>
>> 抄送: "Dongjoon Hyun"<[hidden email]>;"Jungtaek Lim"<[hidden email]>;"Jules Damji"<[hidden email]>;"Holden Karau"<[hidden email]>;"Reynold Xin"<[hidden email]>;"Shivaram Venkataraman"<[hidden email]>;"Yuanjian Li"<[hidden email]>;"Spark dev list"<[hidden email]>;"Takeshi Yamamuro"<[hidden email]>;
>>
>> 主题: Re: [DISCUSS] Apache Spark 3.0.1 Release
>>
>>
>>
>> +1, the issues mentioned are really serious.
>>
>>
>>
>> On Tue, Jun 23, 2020 at 7:56 PM Hyukjin Kwon <[hidden email]> wrote:
>>
>> +1.
>>
>> Just as a note,
>> - SPARK-31918 is fixed now, and there's no blocker. - When we build SparkR, we should use the latest R version at least 4.0.0+.
>>
>>
>>
>> 2020년 6월 24일 (수) 오전 11:20, Dongjoon Hyun <[hidden email]>님이 작성:
>>
>> +1
>>
>>
>>
>> Bests,
>>
>> Dongjoon.
>>
>>
>>
>> On Tue, Jun 23, 2020 at 1:19 PM Jungtaek Lim <[hidden email]> wrote:
>>
>> +1 on a 3.0.1 soon.
>>
>>
>>
>> Probably it would be nice if some Scala experts can take a look at https://issues.apache.org/jira/browse/SPARK-32051 and include the fix into 3.0.1 if possible.
>>
>> Looks like APIs designed to work with Scala 2.11 & Java bring ambiguity in Scala 2.12 & Java.
>>
>>
>>
>> On Wed, Jun 24, 2020 at 4:52 AM Jules Damji <[hidden email]> wrote:
>>
>> +1 (non-binding)
>>
>>
>>
>> Sent from my iPhone
>>
>> Pardon the dumb thumb typos :)
>>
>>
>>
>> On Jun 23, 2020, at 11:36 AM, Holden Karau <[hidden email]> wrote:
>>
>> +1 on a patch release soon
>>
>>
>>
>> On Tue, Jun 23, 2020 at 10:47 AM Reynold Xin <[hidden email]> wrote:
>>
>> Error! Filename not specified.
>>
>> +1 on doing a new patch release soon. I saw some of these issues when preparing the 3.0 release, and some of them are very serious.
>>
>>
>>
>>
>>
>> On Tue, Jun 23, 2020 at 8:06 AM, Shivaram Venkataraman <[hidden email]> wrote:
>>
>> +1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release soon.
>>
>> Shivaram
>>
>> On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro <[hidden email]> wrote:
>>
>> Thanks for the heads-up, Yuanjian!
>>
>> I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.
>>
>> wow, the updates are so quick. Anyway, +1 for the release.
>>
>> Bests,
>> Takeshi
>>
>> On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li <[hidden email]> wrote:
>>
>> Hi dev-list,
>>
>> I’m writing this to raise the discussion about Spark 3.0.1 feasibility since 4 blocker issues were found after Spark 3.0.0:
>>
>> [SPARK-31990] The state store compatibility broken will cause a correctness issue when Streaming query with `dropDuplicate` uses the checkpoint written by the old Spark version.
>>
>> [SPARK-32038] The regression bug in handling NaN values in COUNT(DISTINCT)
>>
>> [SPARK-31918][WIP] CRAN requires to make it working with the latest R 4.0. It makes the 3.0 release unavailable on CRAN, and only supports R [3.5, 4.0)
>>
>> [SPARK-31967] Downgrade vis.js to fix Jobs UI loading time regression
>>
>> I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. I think it would be great if we have Spark 3.0.1 to deliver the critical fixes.
>>
>> Any comments are appreciated.
>>
>> Best,
>>
>> Yuanjian
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>> --------------------------------------------------------------------- To unsubscribe e-mail: [hidden email]
>>
>>
>>
>>
>>
>>
>> --
>>
>> Twitter: https://twitter.com/holdenkarau
>>
>> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau


--
12