Time for 2.3.2?

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Time for 2.3.2?

cloud0fan
Hi all,

Spark 2.3.1 was released just a while ago, but unfortunately we discovered and fixed some critical issues afterward.

SPARK-24495: SortMergeJoin may produce wrong result.
This is a serious correctness bug, and is easy to hit: have duplicated join key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a = t2.c`, and the join is a sort merge join. This bug is only present in Spark 2.3.

SPARK-24588: stream-stream join may produce wrong result
This is a correctness bug in a new feature of Spark 2.3: the stream-stream join. Users can hit this bug if one of the join side is partitioned by a subset of the join keys.

SPARK-24552: Task attempt numbers are reused when stages are retried
This is a long-standing bug in the output committer that may introduce data corruption.

SPARK-24542: UDFXPathXXXX allow users to pass carefully crafted XML to access arbitrary files
This is a potential security issue if users build access control module upon Spark.

I think we need a Spark 2.3.2 to address these issues(especially the correctness bugs) ASAP. Any thoughts?

Thanks,
Wenchen
Reply | Threaded
Open this post in threaded view
|

Re: Time for 2.3.2?

Marcelo Vanzin-2
+1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes
for those out.

(Those are what delayed 2.2.2 and 2.1.3 for those watching...)

On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan <[hidden email]> wrote:

> Hi all,
>
> Spark 2.3.1 was released just a while ago, but unfortunately we discovered
> and fixed some critical issues afterward.
>
> SPARK-24495: SortMergeJoin may produce wrong result.
> This is a serious correctness bug, and is easy to hit: have duplicated join
> key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a = t2.c`, and the
> join is a sort merge join. This bug is only present in Spark 2.3.
>
> SPARK-24588: stream-stream join may produce wrong result
> This is a correctness bug in a new feature of Spark 2.3: the stream-stream
> join. Users can hit this bug if one of the join side is partitioned by a
> subset of the join keys.
>
> SPARK-24552: Task attempt numbers are reused when stages are retried
> This is a long-standing bug in the output committer that may introduce data
> corruption.
>
> SPARK-24542: UDFXPathXXXX allow users to pass carefully crafted XML to
> access arbitrary files
> This is a potential security issue if users build access control module upon
> Spark.
>
> I think we need a Spark 2.3.2 to address these issues(especially the
> correctness bugs) ASAP. Any thoughts?
>
> Thanks,
> Wenchen



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Time for 2.3.2?

Saisai Shao
+1, like mentioned by Marcelo, these issues seems quite severe.

I can work on the release if short of hands :).

Thanks
Jerry


Marcelo Vanzin <[hidden email]> 于2018年6月28日周四 上午11:40写道:
+1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes
for those out.

(Those are what delayed 2.2.2 and 2.1.3 for those watching...)

On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan <[hidden email]> wrote:
> Hi all,
>
> Spark 2.3.1 was released just a while ago, but unfortunately we discovered
> and fixed some critical issues afterward.
>
> SPARK-24495: SortMergeJoin may produce wrong result.
> This is a serious correctness bug, and is easy to hit: have duplicated join
> key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a = t2.c`, and the
> join is a sort merge join. This bug is only present in Spark 2.3.
>
> SPARK-24588: stream-stream join may produce wrong result
> This is a correctness bug in a new feature of Spark 2.3: the stream-stream
> join. Users can hit this bug if one of the join side is partitioned by a
> subset of the join keys.
>
> SPARK-24552: Task attempt numbers are reused when stages are retried
> This is a long-standing bug in the output committer that may introduce data
> corruption.
>
> SPARK-24542: UDFXPathXXXX allow users to pass carefully crafted XML to
> access arbitrary files
> This is a potential security issue if users build access control module upon
> Spark.
>
> I think we need a Spark 2.3.2 to address these issues(especially the
> correctness bugs) ASAP. Any thoughts?
>
> Thanks,
> Wenchen



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Time for 2.3.2?

cloud0fan
Hi Saisai, that's great! please go ahead!

On Thu, Jun 28, 2018 at 12:56 PM Saisai Shao <[hidden email]> wrote:
+1, like mentioned by Marcelo, these issues seems quite severe.

I can work on the release if short of hands :).

Thanks
Jerry


Marcelo Vanzin <[hidden email]> 于2018年6月28日周四 上午11:40写道:
+1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes
for those out.

(Those are what delayed 2.2.2 and 2.1.3 for those watching...)

On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan <[hidden email]> wrote:
> Hi all,
>
> Spark 2.3.1 was released just a while ago, but unfortunately we discovered
> and fixed some critical issues afterward.
>
> SPARK-24495: SortMergeJoin may produce wrong result.
> This is a serious correctness bug, and is easy to hit: have duplicated join
> key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a = t2.c`, and the
> join is a sort merge join. This bug is only present in Spark 2.3.
>
> SPARK-24588: stream-stream join may produce wrong result
> This is a correctness bug in a new feature of Spark 2.3: the stream-stream
> join. Users can hit this bug if one of the join side is partitioned by a
> subset of the join keys.
>
> SPARK-24552: Task attempt numbers are reused when stages are retried
> This is a long-standing bug in the output committer that may introduce data
> corruption.
>
> SPARK-24542: UDFXPathXXXX allow users to pass carefully crafted XML to
> access arbitrary files
> This is a potential security issue if users build access control module upon
> Spark.
>
> I think we need a Spark 2.3.2 to address these issues(especially the
> correctness bugs) ASAP. Any thoughts?
>
> Thanks,
> Wenchen



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Time for 2.3.2?

Jiang Xingbo
+1

Wenchen Fan <[hidden email]>于2018年6月28日 周四下午2:06写道:
Hi Saisai, that's great! please go ahead!

On Thu, Jun 28, 2018 at 12:56 PM Saisai Shao <[hidden email]> wrote:
+1, like mentioned by Marcelo, these issues seems quite severe.

I can work on the release if short of hands :).

Thanks
Jerry


Marcelo Vanzin <[hidden email]> 于2018年6月28日周四 上午11:40写道:
+1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes
for those out.

(Those are what delayed 2.2.2 and 2.1.3 for those watching...)

On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan <[hidden email]> wrote:
> Hi all,
>
> Spark 2.3.1 was released just a while ago, but unfortunately we discovered
> and fixed some critical issues afterward.
>
> SPARK-24495: SortMergeJoin may produce wrong result.
> This is a serious correctness bug, and is easy to hit: have duplicated join
> key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a = t2.c`, and the
> join is a sort merge join. This bug is only present in Spark 2.3.
>
> SPARK-24588: stream-stream join may produce wrong result
> This is a correctness bug in a new feature of Spark 2.3: the stream-stream
> join. Users can hit this bug if one of the join side is partitioned by a
> subset of the join keys.
>
> SPARK-24552: Task attempt numbers are reused when stages are retried
> This is a long-standing bug in the output committer that may introduce data
> corruption.
>
> SPARK-24542: UDFXPathXXXX allow users to pass carefully crafted XML to
> access arbitrary files
> This is a potential security issue if users build access control module upon
> Spark.
>
> I think we need a Spark 2.3.2 to address these issues(especially the
> correctness bugs) ASAP. Any thoughts?
>
> Thanks,
> Wenchen



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Time for 2.3.2?

Takeshi Yamamuro
+1, I heard some Spark users have skipped v2.3.1 because of these bugs.

On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang <[hidden email]> wrote:
+1

Wenchen Fan <[hidden email]>于2018年6月28日 周四下午2:06写道:
Hi Saisai, that's great! please go ahead!

On Thu, Jun 28, 2018 at 12:56 PM Saisai Shao <[hidden email]> wrote:
+1, like mentioned by Marcelo, these issues seems quite severe.

I can work on the release if short of hands :).

Thanks
Jerry


Marcelo Vanzin <[hidden email]> 于2018年6月28日周四 上午11:40写道:
+1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes
for those out.

(Those are what delayed 2.2.2 and 2.1.3 for those watching...)

On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan <[hidden email]> wrote:
> Hi all,
>
> Spark 2.3.1 was released just a while ago, but unfortunately we discovered
> and fixed some critical issues afterward.
>
> SPARK-24495: SortMergeJoin may produce wrong result.
> This is a serious correctness bug, and is easy to hit: have duplicated join
> key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a = t2.c`, and the
> join is a sort merge join. This bug is only present in Spark 2.3.
>
> SPARK-24588: stream-stream join may produce wrong result
> This is a correctness bug in a new feature of Spark 2.3: the stream-stream
> join. Users can hit this bug if one of the join side is partitioned by a
> subset of the join keys.
>
> SPARK-24552: Task attempt numbers are reused when stages are retried
> This is a long-standing bug in the output committer that may introduce data
> corruption.
>
> SPARK-24542: UDFXPathXXXX allow users to pass carefully crafted XML to
> access arbitrary files
> This is a potential security issue if users build access control module upon
> Spark.
>
> I think we need a Spark 2.3.2 to address these issues(especially the
> correctness bugs) ASAP. Any thoughts?
>
> Thanks,
> Wenchen



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



--
---
Takeshi Yamamuro
Reply | Threaded
Open this post in threaded view
|

Re: Time for 2.3.2?

Marco Gaido
+1 too, I'd consider also to include SPARK-24208 if we can solve it timely...

2018-06-28 8:28 GMT+02:00 Takeshi Yamamuro <[hidden email]>:
+1, I heard some Spark users have skipped v2.3.1 because of these bugs.

On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang <[hidden email]> wrote:
+1

Wenchen Fan <[hidden email]>于2018年6月28日 周四下午2:06写道:
Hi Saisai, that's great! please go ahead!

On Thu, Jun 28, 2018 at 12:56 PM Saisai Shao <[hidden email]> wrote:
+1, like mentioned by Marcelo, these issues seems quite severe.

I can work on the release if short of hands :).

Thanks
Jerry


Marcelo Vanzin <[hidden email]> 于2018年6月28日周四 上午11:40写道:
+1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes
for those out.

(Those are what delayed 2.2.2 and 2.1.3 for those watching...)

On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan <[hidden email]> wrote:
> Hi all,
>
> Spark 2.3.1 was released just a while ago, but unfortunately we discovered
> and fixed some critical issues afterward.
>
> SPARK-24495: SortMergeJoin may produce wrong result.
> This is a serious correctness bug, and is easy to hit: have duplicated join
> key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a = t2.c`, and the
> join is a sort merge join. This bug is only present in Spark 2.3.
>
> SPARK-24588: stream-stream join may produce wrong result
> This is a correctness bug in a new feature of Spark 2.3: the stream-stream
> join. Users can hit this bug if one of the join side is partitioned by a
> subset of the join keys.
>
> SPARK-24552: Task attempt numbers are reused when stages are retried
> This is a long-standing bug in the output committer that may introduce data
> corruption.
>
> SPARK-24542: UDFXPathXXXX allow users to pass carefully crafted XML to
> access arbitrary files
> This is a potential security issue if users build access control module upon
> Spark.
>
> I think we need a Spark 2.3.2 to address these issues(especially the
> correctness bugs) ASAP. Any thoughts?
>
> Thanks,
> Wenchen



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



--
---
Takeshi Yamamuro

Reply | Threaded
Open this post in threaded view
|

Re: Time for 2.3.2?

Stavros Kontopoulos-3
+1 makes sense.

On Thu, Jun 28, 2018 at 12:07 PM, Marco Gaido <[hidden email]> wrote:
+1 too, I'd consider also to include SPARK-24208 if we can solve it timely...

2018-06-28 8:28 GMT+02:00 Takeshi Yamamuro <[hidden email]>:
+1, I heard some Spark users have skipped v2.3.1 because of these bugs.

On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang <[hidden email]> wrote:
+1

Wenchen Fan <[hidden email]>于2018年6月28日 周四下午2:06写道:
Hi Saisai, that's great! please go ahead!

On Thu, Jun 28, 2018 at 12:56 PM Saisai Shao <[hidden email]> wrote:
+1, like mentioned by Marcelo, these issues seems quite severe.

I can work on the release if short of hands :).

Thanks
Jerry


Marcelo Vanzin <[hidden email]> 于2018年6月28日周四 上午11:40写道:
+1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes
for those out.

(Those are what delayed 2.2.2 and 2.1.3 for those watching...)

On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan <[hidden email]> wrote:
> Hi all,
>
> Spark 2.3.1 was released just a while ago, but unfortunately we discovered
> and fixed some critical issues afterward.
>
> SPARK-24495: SortMergeJoin may produce wrong result.
> This is a serious correctness bug, and is easy to hit: have duplicated join
> key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a = t2.c`, and the
> join is a sort merge join. This bug is only present in Spark 2.3.
>
> SPARK-24588: stream-stream join may produce wrong result
> This is a correctness bug in a new feature of Spark 2.3: the stream-stream
> join. Users can hit this bug if one of the join side is partitioned by a
> subset of the join keys.
>
> SPARK-24552: Task attempt numbers are reused when stages are retried
> This is a long-standing bug in the output committer that may introduce data
> corruption.
>
> SPARK-24542: UDFXPathXXXX allow users to pass carefully crafted XML to
> access arbitrary files
> This is a potential security issue if users build access control module upon
> Spark.
>
> I think we need a Spark 2.3.2 to address these issues(especially the
> correctness bugs) ASAP. Any thoughts?
>
> Thanks,
> Wenchen



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



--
---
Takeshi Yamamuro




--
Stavros Kontopoulos
Senior Software Engineer
Lightbend, Inc.
<a href="tel:%2B1%20650%20678%200020" value="+16506780020" target="_blank">p:  +30 6977967274
Reply | Threaded
Open this post in threaded view
|

Re: Time for 2.3.2?

Felix Cheung
+1

I’d like to fix SPARK-24535 first though


From: Stavros Kontopoulos <[hidden email]>
Sent: Thursday, June 28, 2018 3:50:34 AM
To: Marco Gaido
Cc: Takeshi Yamamuro; Xingbo Jiang; Wenchen Fan; Spark dev list; Saisai Shao; [hidden email]
Subject: Re: Time for 2.3.2?
 
+1 makes sense.

On Thu, Jun 28, 2018 at 12:07 PM, Marco Gaido <[hidden email]> wrote:
+1 too, I'd consider also to include SPARK-24208 if we can solve it timely...

2018-06-28 8:28 GMT+02:00 Takeshi Yamamuro <[hidden email]>:
+1, I heard some Spark users have skipped v2.3.1 because of these bugs.

On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang <[hidden email]> wrote:
+1

Wenchen Fan <[hidden email]>于2018年6月28日 周四下午2:06写道:
Hi Saisai, that's great! please go ahead!

On Thu, Jun 28, 2018 at 12:56 PM Saisai Shao <[hidden email]> wrote:
+1, like mentioned by Marcelo, these issues seems quite severe.

I can work on the release if short of hands :).

Thanks
Jerry


Marcelo Vanzin <[hidden email]> 于2018年6月28日周四 上午11:40写道:
+1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes
for those out.

(Those are what delayed 2.2.2 and 2.1.3 for those watching...)

On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan <[hidden email]> wrote:
> Hi all,
>
> Spark 2.3.1 was released just a while ago, but unfortunately we discovered
> and fixed some critical issues afterward.
>
> SPARK-24495: SortMergeJoin may produce wrong result.
> This is a serious correctness bug, and is easy to hit: have duplicated join
> key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a = t2.c`, and the
> join is a sort merge join. This bug is only present in Spark 2.3.
>
> SPARK-24588: stream-stream join may produce wrong result
> This is a correctness bug in a new feature of Spark 2.3: the stream-stream
> join. Users can hit this bug if one of the join side is partitioned by a
> subset of the join keys.
>
> SPARK-24552: Task attempt numbers are reused when stages are retried
> This is a long-standing bug in the output committer that may introduce data
> corruption.
>
> SPARK-24542: UDFXPathXXXX allow users to pass carefully crafted XML to
> access arbitrary files
> This is a potential security issue if users build access control module upon
> Spark.
>
> I think we need a Spark 2.3.2 to address these issues(especially the
> correctness bugs) ASAP. Any thoughts?
>
> Thanks,
> Wenchen



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



--
---
Takeshi Yamamuro




--
Stavros Kontopoulos
Senior Software Engineer
Lightbend, Inc.
<a href="tel:%2B1%20650%20678%200020" value="&#43;16506780020" target="_blank">p:  +30 6977967274
Reply | Threaded
Open this post in threaded view
|

Re: Time for 2.3.2?

Marcelo Vanzin-2
Could you mark that bug as blocker and set the target version, in that case?

On Thu, Jun 28, 2018 at 8:46 AM, Felix Cheung <[hidden email]> wrote:
+1

I’d like to fix SPARK-24535 first though


From: Stavros Kontopoulos <[hidden email]>
Sent: Thursday, June 28, 2018 3:50:34 AM
To: Marco Gaido
Cc: Takeshi Yamamuro; Xingbo Jiang; Wenchen Fan; Spark dev list; Saisai Shao; [hidden email]
Subject: Re: Time for 2.3.2?
 
+1 makes sense.

On Thu, Jun 28, 2018 at 12:07 PM, Marco Gaido <[hidden email]> wrote:
+1 too, I'd consider also to include SPARK-24208 if we can solve it timely...

2018-06-28 8:28 GMT+02:00 Takeshi Yamamuro <[hidden email]>:
+1, I heard some Spark users have skipped v2.3.1 because of these bugs.

On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang <[hidden email]> wrote:
+1

Wenchen Fan <[hidden email]>于2018年6月28日 周四下午2:06写道:
Hi Saisai, that's great! please go ahead!

On Thu, Jun 28, 2018 at 12:56 PM Saisai Shao <[hidden email]> wrote:
+1, like mentioned by Marcelo, these issues seems quite severe.

I can work on the release if short of hands :).

Thanks
Jerry


Marcelo Vanzin <[hidden email]> 于2018年6月28日周四 上午11:40写道:
+1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes
for those out.

(Those are what delayed 2.2.2 and 2.1.3 for those watching...)

On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan <[hidden email]> wrote:
> Hi all,
>
> Spark 2.3.1 was released just a while ago, but unfortunately we discovered
> and fixed some critical issues afterward.
>
> SPARK-24495: SortMergeJoin may produce wrong result.
> This is a serious correctness bug, and is easy to hit: have duplicated join
> key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a = t2.c`, and the
> join is a sort merge join. This bug is only present in Spark 2.3.
>
> SPARK-24588: stream-stream join may produce wrong result
> This is a correctness bug in a new feature of Spark 2.3: the stream-stream
> join. Users can hit this bug if one of the join side is partitioned by a
> subset of the join keys.
>
> SPARK-24552: Task attempt numbers are reused when stages are retried
> This is a long-standing bug in the output committer that may introduce data
> corruption.
>
> SPARK-24542: UDFXPathXXXX allow users to pass carefully crafted XML to
> access arbitrary files
> This is a potential security issue if users build access control module upon
> Spark.
>
> I think we need a Spark 2.3.2 to address these issues(especially the
> correctness bugs) ASAP. Any thoughts?
>
> Thanks,
> Wenchen



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



--
---
Takeshi Yamamuro




--
Stavros Kontopoulos
Senior Software Engineer
Lightbend, Inc.
<a href="tel:%2B1%20650%20678%200020" value="+16506780020" target="_blank">p:  +30 6977967274



--
Marcelo
Reply | Threaded
Open this post in threaded view
|

Re: Time for 2.3.2?

Felix Cheung
Yap will do


From: Marcelo Vanzin <[hidden email]>
Sent: Thursday, June 28, 2018 9:04:41 AM
To: Felix Cheung
Cc: Spark dev list
Subject: Re: Time for 2.3.2?
 
Could you mark that bug as blocker and set the target version, in that case?

On Thu, Jun 28, 2018 at 8:46 AM, Felix Cheung <[hidden email]> wrote:
+1

I’d like to fix SPARK-24535 first though


From: Stavros Kontopoulos <[hidden email]>
Sent: Thursday, June 28, 2018 3:50:34 AM
To: Marco Gaido
Cc: Takeshi Yamamuro; Xingbo Jiang; Wenchen Fan; Spark dev list; Saisai Shao; [hidden email]
Subject: Re: Time for 2.3.2?
 
+1 makes sense.

On Thu, Jun 28, 2018 at 12:07 PM, Marco Gaido <[hidden email]> wrote:
+1 too, I'd consider also to include SPARK-24208 if we can solve it timely...

2018-06-28 8:28 GMT+02:00 Takeshi Yamamuro <[hidden email]>:
+1, I heard some Spark users have skipped v2.3.1 because of these bugs.

On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang <[hidden email]> wrote:
+1

Wenchen Fan <[hidden email]>于2018年6月28日 周四下午2:06写道:
Hi Saisai, that's great! please go ahead!

On Thu, Jun 28, 2018 at 12:56 PM Saisai Shao <[hidden email]> wrote:
+1, like mentioned by Marcelo, these issues seems quite severe.

I can work on the release if short of hands :).

Thanks
Jerry


Marcelo Vanzin <[hidden email]> 于2018年6月28日周四 上午11:40写道:
+1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes
for those out.

(Those are what delayed 2.2.2 and 2.1.3 for those watching...)

On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan <[hidden email]> wrote:
> Hi all,
>
> Spark 2.3.1 was released just a while ago, but unfortunately we discovered
> and fixed some critical issues afterward.
>
> SPARK-24495: SortMergeJoin may produce wrong result.
> This is a serious correctness bug, and is easy to hit: have duplicated join
> key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a = t2.c`, and the
> join is a sort merge join. This bug is only present in Spark 2.3.
>
> SPARK-24588: stream-stream join may produce wrong result
> This is a correctness bug in a new feature of Spark 2.3: the stream-stream
> join. Users can hit this bug if one of the join side is partitioned by a
> subset of the join keys.
>
> SPARK-24552: Task attempt numbers are reused when stages are retried
> This is a long-standing bug in the output committer that may introduce data
> corruption.
>
> SPARK-24542: UDFXPathXXXX allow users to pass carefully crafted XML to
> access arbitrary files
> This is a potential security issue if users build access control module upon
> Spark.
>
> I think we need a Spark 2.3.2 to address these issues(especially the
> correctness bugs) ASAP. Any thoughts?
>
> Thanks,
> Wenchen



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



--
---
Takeshi Yamamuro




--
Stavros Kontopoulos
Senior Software Engineer
Lightbend, Inc.
<a href="tel:%2B1%20650%20678%200020" value="&#43;16506780020" target="_blank">p:  +30 6977967274



--
Marcelo
Reply | Threaded
Open this post in threaded view
|

Re: Time for 2.3.2?

Xiao Li
In reply to this post by Takeshi Yamamuro
+1. Thanks, Saisai!  

The impact of SPARK-24495 is large. We should release Spark 2.3.2 ASAP. 

Thanks, 

Xiao

2018-06-27 23:28 GMT-07:00 Takeshi Yamamuro <[hidden email]>:
+1, I heard some Spark users have skipped v2.3.1 because of these bugs.

On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang <[hidden email]> wrote:
+1

Wenchen Fan <[hidden email]>于2018年6月28日 周四下午2:06写道:
Hi Saisai, that's great! please go ahead!

On Thu, Jun 28, 2018 at 12:56 PM Saisai Shao <[hidden email]> wrote:
+1, like mentioned by Marcelo, these issues seems quite severe.

I can work on the release if short of hands :).

Thanks
Jerry


Marcelo Vanzin <[hidden email]> 于2018年6月28日周四 上午11:40写道:
+1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes
for those out.

(Those are what delayed 2.2.2 and 2.1.3 for those watching...)

On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan <[hidden email]> wrote:
> Hi all,
>
> Spark 2.3.1 was released just a while ago, but unfortunately we discovered
> and fixed some critical issues afterward.
>
> SPARK-24495: SortMergeJoin may produce wrong result.
> This is a serious correctness bug, and is easy to hit: have duplicated join
> key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a = t2.c`, and the
> join is a sort merge join. This bug is only present in Spark 2.3.
>
> SPARK-24588: stream-stream join may produce wrong result
> This is a correctness bug in a new feature of Spark 2.3: the stream-stream
> join. Users can hit this bug if one of the join side is partitioned by a
> subset of the join keys.
>
> SPARK-24552: Task attempt numbers are reused when stages are retried
> This is a long-standing bug in the output committer that may introduce data
> corruption.
>
> SPARK-24542: UDFXPathXXXX allow users to pass carefully crafted XML to
> access arbitrary files
> This is a potential security issue if users build access control module upon
> Spark.
>
> I think we need a Spark 2.3.2 to address these issues(especially the
> correctness bugs) ASAP. Any thoughts?
>
> Thanks,
> Wenchen



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



--
---
Takeshi Yamamuro

Reply | Threaded
Open this post in threaded view
|

Re: Time for 2.3.2?

Ryan Blue
+1

On Thu, Jun 28, 2018 at 9:34 AM Xiao Li <[hidden email]> wrote:
+1. Thanks, Saisai!  

The impact of SPARK-24495 is large. We should release Spark 2.3.2 ASAP. 

Thanks, 

Xiao

2018-06-27 23:28 GMT-07:00 Takeshi Yamamuro <[hidden email]>:
+1, I heard some Spark users have skipped v2.3.1 because of these bugs.

On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang <[hidden email]> wrote:
+1

Wenchen Fan <[hidden email]>于2018年6月28日 周四下午2:06写道:
Hi Saisai, that's great! please go ahead!

On Thu, Jun 28, 2018 at 12:56 PM Saisai Shao <[hidden email]> wrote:
+1, like mentioned by Marcelo, these issues seems quite severe.

I can work on the release if short of hands :).

Thanks
Jerry


Marcelo Vanzin <[hidden email]> 于2018年6月28日周四 上午11:40写道:
+1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes
for those out.

(Those are what delayed 2.2.2 and 2.1.3 for those watching...)

On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan <[hidden email]> wrote:
> Hi all,
>
> Spark 2.3.1 was released just a while ago, but unfortunately we discovered
> and fixed some critical issues afterward.
>
> SPARK-24495: SortMergeJoin may produce wrong result.
> This is a serious correctness bug, and is easy to hit: have duplicated join
> key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a = t2.c`, and the
> join is a sort merge join. This bug is only present in Spark 2.3.
>
> SPARK-24588: stream-stream join may produce wrong result
> This is a correctness bug in a new feature of Spark 2.3: the stream-stream
> join. Users can hit this bug if one of the join side is partitioned by a
> subset of the join keys.
>
> SPARK-24552: Task attempt numbers are reused when stages are retried
> This is a long-standing bug in the output committer that may introduce data
> corruption.
>
> SPARK-24542: UDFXPathXXXX allow users to pass carefully crafted XML to
> access arbitrary files
> This is a potential security issue if users build access control module upon
> Spark.
>
> I think we need a Spark 2.3.2 to address these issues(especially the
> correctness bugs) ASAP. Any thoughts?
>
> Thanks,
> Wenchen



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



--
---
Takeshi Yamamuro



--
Ryan Blue
Software Engineer
Netflix
Reply | Threaded
Open this post in threaded view
|

Re: Time for 2.3.2?

gvramana
+1. Need to release Spark 2.3.2 ASAP

Thanks,
Venkata Ramana Gollamudi



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Time for 2.3.2?

Yu, Yucai-2
In reply to this post by cloud0fan

+1. We are evaluating 2.3.1, please release Spark 2.3.2 ASAP.

 

Thanks,

Yucai

Reply | Threaded
Open this post in threaded view
|

Re: Time for 2.3.2?

John Zhuge
In reply to this post by Ryan Blue
+1  Looking forward to the critical fixes in 2.3.2.

On Thu, Jun 28, 2018 at 9:37 AM Ryan Blue <[hidden email]> wrote:
+1

On Thu, Jun 28, 2018 at 9:34 AM Xiao Li <[hidden email]> wrote:
+1. Thanks, Saisai!  

The impact of SPARK-24495 is large. We should release Spark 2.3.2 ASAP. 

Thanks, 

Xiao

2018-06-27 23:28 GMT-07:00 Takeshi Yamamuro <[hidden email]>:
+1, I heard some Spark users have skipped v2.3.1 because of these bugs.

On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang <[hidden email]> wrote:
+1

Wenchen Fan <[hidden email]>于2018年6月28日 周四下午2:06写道:
Hi Saisai, that's great! please go ahead!

On Thu, Jun 28, 2018 at 12:56 PM Saisai Shao <[hidden email]> wrote:
+1, like mentioned by Marcelo, these issues seems quite severe.

I can work on the release if short of hands :).

Thanks
Jerry


Marcelo Vanzin <[hidden email]> 于2018年6月28日周四 上午11:40写道:
+1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes
for those out.

(Those are what delayed 2.2.2 and 2.1.3 for those watching...)

On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan <[hidden email]> wrote:
> Hi all,
>
> Spark 2.3.1 was released just a while ago, but unfortunately we discovered
> and fixed some critical issues afterward.
>
> SPARK-24495: SortMergeJoin may produce wrong result.
> This is a serious correctness bug, and is easy to hit: have duplicated join
> key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a = t2.c`, and the
> join is a sort merge join. This bug is only present in Spark 2.3.
>
> SPARK-24588: stream-stream join may produce wrong result
> This is a correctness bug in a new feature of Spark 2.3: the stream-stream
> join. Users can hit this bug if one of the join side is partitioned by a
> subset of the join keys.
>
> SPARK-24552: Task attempt numbers are reused when stages are retried
> This is a long-standing bug in the output committer that may introduce data
> corruption.
>
> SPARK-24542: UDFXPathXXXX allow users to pass carefully crafted XML to
> access arbitrary files
> This is a potential security issue if users build access control module upon
> Spark.
>
> I think we need a Spark 2.3.2 to address these issues(especially the
> correctness bugs) ASAP. Any thoughts?
>
> Thanks,
> Wenchen



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



--
---
Takeshi Yamamuro



--
Ryan Blue
Software Engineer
Netflix

--
John Zhuge
Reply | Threaded
Open this post in threaded view
|

Re: Time for 2.3.2?

Saisai Shao
I will start preparing the release.

Thanks

John Zhuge <[hidden email]> 于2018年6月30日周六 上午10:31写道:
+1  Looking forward to the critical fixes in 2.3.2.

On Thu, Jun 28, 2018 at 9:37 AM Ryan Blue <[hidden email]> wrote:
+1

On Thu, Jun 28, 2018 at 9:34 AM Xiao Li <[hidden email]> wrote:
+1. Thanks, Saisai!  

The impact of SPARK-24495 is large. We should release Spark 2.3.2 ASAP. 

Thanks, 

Xiao

2018-06-27 23:28 GMT-07:00 Takeshi Yamamuro <[hidden email]>:
+1, I heard some Spark users have skipped v2.3.1 because of these bugs.

On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang <[hidden email]> wrote:
+1

Wenchen Fan <[hidden email]>于2018年6月28日 周四下午2:06写道:
Hi Saisai, that's great! please go ahead!

On Thu, Jun 28, 2018 at 12:56 PM Saisai Shao <[hidden email]> wrote:
+1, like mentioned by Marcelo, these issues seems quite severe.

I can work on the release if short of hands :).

Thanks
Jerry


Marcelo Vanzin <[hidden email]> 于2018年6月28日周四 上午11:40写道:
+1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes
for those out.

(Those are what delayed 2.2.2 and 2.1.3 for those watching...)

On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan <[hidden email]> wrote:
> Hi all,
>
> Spark 2.3.1 was released just a while ago, but unfortunately we discovered
> and fixed some critical issues afterward.
>
> SPARK-24495: SortMergeJoin may produce wrong result.
> This is a serious correctness bug, and is easy to hit: have duplicated join
> key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a = t2.c`, and the
> join is a sort merge join. This bug is only present in Spark 2.3.
>
> SPARK-24588: stream-stream join may produce wrong result
> This is a correctness bug in a new feature of Spark 2.3: the stream-stream
> join. Users can hit this bug if one of the join side is partitioned by a
> subset of the join keys.
>
> SPARK-24552: Task attempt numbers are reused when stages are retried
> This is a long-standing bug in the output committer that may introduce data
> corruption.
>
> SPARK-24542: UDFXPathXXXX allow users to pass carefully crafted XML to
> access arbitrary files
> This is a potential security issue if users build access control module upon
> Spark.
>
> I think we need a Spark 2.3.2 to address these issues(especially the
> correctness bugs) ASAP. Any thoughts?
>
> Thanks,
> Wenchen



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



--
---
Takeshi Yamamuro



--
Ryan Blue
Software Engineer
Netflix

--
John Zhuge
Reply | Threaded
Open this post in threaded view
|

Re: Time for 2.3.2?

Saisai Shao
FYI, currently we have one block issue (https://issues.apache.org/jira/browse/SPARK-24535), will start the release after this is fixed.

Also please let me know if there're other blocks or fixes want to land to 2.3.2 release.

Thanks
Saisai

Saisai Shao <[hidden email]> 于2018年7月2日周一 下午1:16写道:
I will start preparing the release.

Thanks

John Zhuge <[hidden email]> 于2018年6月30日周六 上午10:31写道:
+1  Looking forward to the critical fixes in 2.3.2.

On Thu, Jun 28, 2018 at 9:37 AM Ryan Blue <[hidden email]> wrote:
+1

On Thu, Jun 28, 2018 at 9:34 AM Xiao Li <[hidden email]> wrote:
+1. Thanks, Saisai!  

The impact of SPARK-24495 is large. We should release Spark 2.3.2 ASAP. 

Thanks, 

Xiao

2018-06-27 23:28 GMT-07:00 Takeshi Yamamuro <[hidden email]>:
+1, I heard some Spark users have skipped v2.3.1 because of these bugs.

On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang <[hidden email]> wrote:
+1

Wenchen Fan <[hidden email]>于2018年6月28日 周四下午2:06写道:
Hi Saisai, that's great! please go ahead!

On Thu, Jun 28, 2018 at 12:56 PM Saisai Shao <[hidden email]> wrote:
+1, like mentioned by Marcelo, these issues seems quite severe.

I can work on the release if short of hands :).

Thanks
Jerry


Marcelo Vanzin <[hidden email]> 于2018年6月28日周四 上午11:40写道:
+1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes
for those out.

(Those are what delayed 2.2.2 and 2.1.3 for those watching...)

On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan <[hidden email]> wrote:
> Hi all,
>
> Spark 2.3.1 was released just a while ago, but unfortunately we discovered
> and fixed some critical issues afterward.
>
> SPARK-24495: SortMergeJoin may produce wrong result.
> This is a serious correctness bug, and is easy to hit: have duplicated join
> key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a = t2.c`, and the
> join is a sort merge join. This bug is only present in Spark 2.3.
>
> SPARK-24588: stream-stream join may produce wrong result
> This is a correctness bug in a new feature of Spark 2.3: the stream-stream
> join. Users can hit this bug if one of the join side is partitioned by a
> subset of the join keys.
>
> SPARK-24552: Task attempt numbers are reused when stages are retried
> This is a long-standing bug in the output committer that may introduce data
> corruption.
>
> SPARK-24542: UDFXPathXXXX allow users to pass carefully crafted XML to
> access arbitrary files
> This is a potential security issue if users build access control module upon
> Spark.
>
> I think we need a Spark 2.3.2 to address these issues(especially the
> correctness bugs) ASAP. Any thoughts?
>
> Thanks,
> Wenchen



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



--
---
Takeshi Yamamuro



--
Ryan Blue
Software Engineer
Netflix

--
John Zhuge