Spark 2.4.5 RC2 Preparation Status

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark 2.4.5 RC2 Preparation Status

Dongjoon Hyun-2
Hi, All.

RC2 was scheduled on Today and all RC1 feedbacks seems to be addressed.
However, I'm waiting for another on-going correctness PR.

    https://github.com/apache/spark/pull/27233
    [SPARK-29701][SQL] Correct behaviours of group analytical queries when empty input given

Unlike the other correctness issues (I sent previsouly), this one is active enough to make RC2 fail. As we know, Spark 2.4.5 RC1 vote failed because the correctness patch landed on `master` branch during the RC1 vote period and there was official requests for backporting.

    https://github.com/apache/spark/pull/27229
    [SPARK-29708][SQL][2.4] Correct aggregated values when grouping sets are duplicated

It's risk to start RC2 without considering it because VOTE is also consuming the community resources.

BTW, if there is another on-going notable PR for 2.4.5 RC1, please reply to me.

Thanks,
Dongjoon.
Reply | Threaded
Open this post in threaded view
|

Re: Spark 2.4.5 RC2 Preparation Status

Nicholas Marion

Hello,

Was wondering if RC2 is expected to release soon? Any chance that https://issues.apache.org/jira/browse/SPARK-30310 could be added to branch-2.4 as well for 2.4.5 release? Especially since 2.4.x introduced the bug?


Regards,

NICHOLAS T. MARION
IBM Open Data Analytics for z/OS -
CPO and Service Team Lead

Phone: 1-845-433-5010 | Tie-Line: 293-5010
E-mail:
[hidden email]
Find me on:
LinkedIn: http://www.linkedin.com/in/nicholasmarion
IBM

2455 South Rd
Poughkeepie, New York 12601-5400
United States
IBM Redbooks Silver AuthorData Science Foundations - Level 1



Inactive hide details for Dongjoon Hyun ---01/20/2020 11:27:19 PM---Hi, All. RC2 was scheduled on Today and all RC1 feedbacks sDongjoon Hyun ---01/20/2020 11:27:19 PM---Hi, All. RC2 was scheduled on Today and all RC1 feedbacks seems to be addressed.

From: Dongjoon Hyun <[hidden email]>
To: dev <[hidden email]>
Date: 01/20/2020 11:27 PM
Subject: [EXTERNAL] Spark 2.4.5 RC2 Preparation Status




Hi, All.

RC2 was scheduled on Today and all RC1 feedbacks seems to be addressed.
However, I'm waiting for another on-going correctness PR.

    https://github.com/apache/spark/pull/27233
    [SPARK-29701][SQL] Correct behaviours of group analytical queries when empty input given

Unlike the other correctness issues (I sent previsouly), this one is active enough to make RC2 fail. As we know, Spark 2.4.5 RC1 vote failed because the correctness patch landed on `master` branch during the RC1 vote period and there was official requests for backporting.

    https://github.com/apache/spark/pull/27229
    [SPARK-29708][SQL][2.4] Correct aggregated values when grouping sets are duplicated

It's risk to start RC2 without considering it because VOTE is also consuming the community resources.

BTW, if there is another on-going notable PR for 2.4.5 RC1, please reply to me.

Thanks,
Dongjoon.


Reply | Threaded
Open this post in threaded view
|

Re: Spark 2.4.5 RC2 Preparation Status

Dongjoon Hyun-2
Hi, Nicholas and all.

RC2 is blocked by the community policy on correctness/dataloss issues.

I cut the RC1 when there were no correctness/dataloss issue targeting on 2.4.5. However, it fails because one correctness issue (target = 3.0.0) is resolved and the community changes the target to 2.4.5 at the last day of RC1 vote.

As of now, there exists 2.4.5 targeting correctness issue. As a release manager, I cannot cut RC2 until there is no correctness/dataloss issue with target=2.4.5. We need to fix it or we need to move the target version to 2.4.6.

That's the current situation. I'm trying to follow the existing community policies, but it seems too idealistic for the release. I'm trying to figure out what is the best option for 2.4.5 in the community. Hopefully, we can start RC2 without known risks at least.

For non-correctness issues, it's up to the progress and decision on them. Those issues are not blockers.

Bests,
Dongjoon.

On Wed, Jan 29, 2020 at 05:39 Nicholas Marion <[hidden email]> wrote:

Hello,

Was wondering if RC2 is expected to release soon? Any chance that https://issues.apache.org/jira/browse/SPARK-30310 could be added to branch-2.4 as well for 2.4.5 release? Especially since 2.4.x introduced the bug?


Regards,

NICHOLAS T. MARION
IBM Open Data Analytics for z/OS -
CPO and Service Team Lead

Phone: 1-845-433-5010 | Tie-Line: 293-5010
E-mail:
[hidden email]
Find me on:
LinkedIn: http://www.linkedin.com/in/nicholasmarion
IBM

2455 South Rd
Poughkeepie, New York 12601-5400
United States
IBM Redbooks Silver AuthorData Science Foundations - Level 1



Inactive hide details for Dongjoon Hyun ---01/20/2020 11:27:19 PM---Hi, All. RC2 was scheduled on Today and all RC1 feedbacks sDongjoon Hyun ---01/20/2020 11:27:19 PM---Hi, All. RC2 was scheduled on Today and all RC1 feedbacks seems to be addressed.



From: Dongjoon Hyun <[hidden email]>
To: dev <[hidden email]>
Date: 01/20/2020 11:27 PM
Subject: [EXTERNAL] Spark 2.4.5 RC2 Preparation Status





Hi, All.

RC2 was scheduled on Today and all RC1 feedbacks seems to be addressed.
However, I'm waiting for another on-going correctness PR.

    https://github.com/apache/spark/pull/27233
    [SPARK-29701][SQL] Correct behaviours of group analytical queries when empty input given

Unlike the other correctness issues (I sent previsouly), this one is active enough to make RC2 fail. As we know, Spark 2.4.5 RC1 vote failed because the correctness patch landed on `master` branch during the RC1 vote period and there was official requests for backporting.

    https://github.com/apache/spark/pull/27229
    [SPARK-29708][SQL][2.4] Correct aggregated values when grouping sets are duplicated

It's risk to start RC2 without considering it because VOTE is also consuming the community resources.

BTW, if there is another on-going notable PR for 2.4.5 RC1, please reply to me.

Thanks,
Dongjoon.


Reply | Threaded
Open this post in threaded view
|

Re: Spark 2.4.5 RC2 Preparation Status

Sean Owen-2
OK what if anything is in question for 2.4.5? I don't see anything open and targeted for it.
Are we talking about https://issues.apache.org/jira/browse/SPARK-28344 - targeted for 2.4.5 but not backported, and a 'correctness' issue?
Simply: who argues this must hold up 2.4.5, and if so what's the status?

On Wed, Jan 29, 2020 at 11:27 AM Dongjoon Hyun <[hidden email]> wrote:
Hi, Nicholas and all.

RC2 is blocked by the community policy on correctness/dataloss issues.

I cut the RC1 when there were no correctness/dataloss issue targeting on 2.4.5. However, it fails because one correctness issue (target = 3.0.0) is resolved and the community changes the target to 2.4.5 at the last day of RC1 vote.

As of now, there exists 2.4.5 targeting correctness issue. As a release manager, I cannot cut RC2 until there is no correctness/dataloss issue with target=2.4.5. We need to fix it or we need to move the target version to 2.4.6.

That's the current situation. I'm trying to follow the existing community policies, but it seems too idealistic for the release. I'm trying to figure out what is the best option for 2.4.5 in the community. Hopefully, we can start RC2 without known risks at least.

For non-correctness issues, it's up to the progress and decision on them. Those issues are not blockers.

Bests,
Dongjoon.

On Wed, Jan 29, 2020 at 05:39 Nicholas Marion <[hidden email]> wrote:

Hello,

Was wondering if RC2 is expected to release soon? Any chance that https://issues.apache.org/jira/browse/SPARK-30310 could be added to branch-2.4 as well for 2.4.5 release? Especially since 2.4.x introduced the bug?


Regards,

NICHOLAS T. MARION
IBM Open Data Analytics for z/OS -
CPO and Service Team Lead

Phone: 1-845-433-5010 | Tie-Line: 293-5010
E-mail:
[hidden email]
Find me on:


2455 South Rd
Poughkeepie, New York 12601-5400
United States



Dongjoon Hyun ---01/20/2020 11:27:19 PM---Hi, All. RC2 was scheduled on Today and all RC1 feedbacks seems to be addressed.



From: Dongjoon Hyun <[hidden email]>
To: dev <[hidden email]>
Date: 01/20/2020 11:27 PM
Subject: [EXTERNAL] Spark 2.4.5 RC2 Preparation Status





Hi, All.

RC2 was scheduled on Today and all RC1 feedbacks seems to be addressed.
However, I'm waiting for another on-going correctness PR.

    https://github.com/apache/spark/pull/27233
    [SPARK-29701][SQL] Correct behaviours of group analytical queries when empty input given

Unlike the other correctness issues (I sent previsouly), this one is active enough to make RC2 fail. As we know, Spark 2.4.5 RC1 vote failed because the correctness patch landed on `master` branch during the RC1 vote period and there was official requests for backporting.

    https://github.com/apache/spark/pull/27229
    [SPARK-29708][SQL][2.4] Correct aggregated values when grouping sets are duplicated

It's risk to start RC2 without considering it because VOTE is also consuming the community resources.

BTW, if there is another on-going notable PR for 2.4.5 RC1, please reply to me.

Thanks,
Dongjoon.


Reply | Threaded
Open this post in threaded view
|

Re: Spark 2.4.5 RC2 Preparation Status

Dongjoon Hyun-2
Great. Sean.

Then, what is your criteria to remove the targeting it from 2.4.5?

It doesn't depend on `Who`, right?

Bests,
Dongjoon.


On Wed, Jan 29, 2020 at 9:56 AM Sean Owen <[hidden email]> wrote:
OK what if anything is in question for 2.4.5? I don't see anything open and targeted for it.
Are we talking about https://issues.apache.org/jira/browse/SPARK-28344 - targeted for 2.4.5 but not backported, and a 'correctness' issue?
Simply: who argues this must hold up 2.4.5, and if so what's the status?

On Wed, Jan 29, 2020 at 11:27 AM Dongjoon Hyun <[hidden email]> wrote:
Hi, Nicholas and all.

RC2 is blocked by the community policy on correctness/dataloss issues.

I cut the RC1 when there were no correctness/dataloss issue targeting on 2.4.5. However, it fails because one correctness issue (target = 3.0.0) is resolved and the community changes the target to 2.4.5 at the last day of RC1 vote.

As of now, there exists 2.4.5 targeting correctness issue. As a release manager, I cannot cut RC2 until there is no correctness/dataloss issue with target=2.4.5. We need to fix it or we need to move the target version to 2.4.6.

That's the current situation. I'm trying to follow the existing community policies, but it seems too idealistic for the release. I'm trying to figure out what is the best option for 2.4.5 in the community. Hopefully, we can start RC2 without known risks at least.

For non-correctness issues, it's up to the progress and decision on them. Those issues are not blockers.

Bests,
Dongjoon.

On Wed, Jan 29, 2020 at 05:39 Nicholas Marion <[hidden email]> wrote:

Hello,

Was wondering if RC2 is expected to release soon? Any chance that https://issues.apache.org/jira/browse/SPARK-30310 could be added to branch-2.4 as well for 2.4.5 release? Especially since 2.4.x introduced the bug?


Regards,

NICHOLAS T. MARION
IBM Open Data Analytics for z/OS -
CPO and Service Team Lead

Phone: 1-845-433-5010 | Tie-Line: 293-5010
E-mail:
[hidden email]
Find me on:
LinkedIn: http://www.linkedin.com/in/nicholasmarion
IBM

2455 South Rd
Poughkeepie, New York 12601-5400
United States
IBM Redbooks Silver AuthorData Science Foundations - Level 1



Inactive hide details for Dongjoon Hyun ---01/20/2020 11:27:19 PM---Hi, All. RC2 was scheduled on Today and all RC1 feedbacks sDongjoon Hyun ---01/20/2020 11:27:19 PM---Hi, All. RC2 was scheduled on Today and all RC1 feedbacks seems to be addressed.



From: Dongjoon Hyun <[hidden email]>
To: dev <[hidden email]>
Date: 01/20/2020 11:27 PM
Subject: [EXTERNAL] Spark 2.4.5 RC2 Preparation Status





Hi, All.

RC2 was scheduled on Today and all RC1 feedbacks seems to be addressed.
However, I'm waiting for another on-going correctness PR.

    https://github.com/apache/spark/pull/27233
    [SPARK-29701][SQL] Correct behaviours of group analytical queries when empty input given

Unlike the other correctness issues (I sent previsouly), this one is active enough to make RC2 fail. As we know, Spark 2.4.5 RC1 vote failed because the correctness patch landed on `master` branch during the RC1 vote period and there was official requests for backporting.

    https://github.com/apache/spark/pull/27229
    [SPARK-29708][SQL][2.4] Correct aggregated values when grouping sets are duplicated

It's risk to start RC2 without considering it because VOTE is also consuming the community resources.

BTW, if there is another on-going notable PR for 2.4.5 RC1, please reply to me.

Thanks,
Dongjoon.


Reply | Threaded
Open this post in threaded view
|

Re: Spark 2.4.5 RC2 Preparation Status

Sean Owen-2
I have no opinion - just figuring out the status too.

I guess I'm asking first, is this the only issue in question?

Does nobody object to untargeting it? -> then we are done for 2.4.5, right?
If anyone does -> what's the next step to resolving it?

I wasn't clear from the JIRA / PR, or whether you are arguing it should or should not be untargeted. Or really who is arguing what about it.


On Wed, Jan 29, 2020 at 12:04 PM Dongjoon Hyun <[hidden email]> wrote:
Great. Sean.

Then, what is your criteria to remove the targeting it from 2.4.5?

It doesn't depend on `Who`, right?

Bests,
Dongjoon.


On Wed, Jan 29, 2020 at 9:56 AM Sean Owen <[hidden email]> wrote:
OK what if anything is in question for 2.4.5? I don't see anything open and targeted for it.
Are we talking about https://issues.apache.org/jira/browse/SPARK-28344 - targeted for 2.4.5 but not backported, and a 'correctness' issue?
Simply: who argues this must hold up 2.4.5, and if so what's the status?

On Wed, Jan 29, 2020 at 11:27 AM Dongjoon Hyun <[hidden email]> wrote:
Hi, Nicholas and all.

RC2 is blocked by the community policy on correctness/dataloss issues.

I cut the RC1 when there were no correctness/dataloss issue targeting on 2.4.5. However, it fails because one correctness issue (target = 3.0.0) is resolved and the community changes the target to 2.4.5 at the last day of RC1 vote.

As of now, there exists 2.4.5 targeting correctness issue. As a release manager, I cannot cut RC2 until there is no correctness/dataloss issue with target=2.4.5. We need to fix it or we need to move the target version to 2.4.6.

That's the current situation. I'm trying to follow the existing community policies, but it seems too idealistic for the release. I'm trying to figure out what is the best option for 2.4.5 in the community. Hopefully, we can start RC2 without known risks at least.

For non-correctness issues, it's up to the progress and decision on them. Those issues are not blockers.

Bests,
Dongjoon.

On Wed, Jan 29, 2020 at 05:39 Nicholas Marion <[hidden email]> wrote:

Hello,

Was wondering if RC2 is expected to release soon? Any chance that https://issues.apache.org/jira/browse/SPARK-30310 could be added to branch-2.4 as well for 2.4.5 release? Especially since 2.4.x introduced the bug?


Regards,

NICHOLAS T. MARION
IBM Open Data Analytics for z/OS -
CPO and Service Team Lead

Phone: 1-845-433-5010 | Tie-Line: 293-5010
E-mail:
[hidden email]
Find me on:
LinkedIn: http://www.linkedin.com/in/nicholasmarion
IBM

2455 South Rd
Poughkeepie, New York 12601-5400
United States
IBM Redbooks Silver AuthorData Science Foundations - Level 1



Inactive hide details for Dongjoon Hyun ---01/20/2020 11:27:19 PM---Hi, All. RC2 was scheduled on Today and all RC1 feedbacks sDongjoon Hyun ---01/20/2020 11:27:19 PM---Hi, All. RC2 was scheduled on Today and all RC1 feedbacks seems to be addressed.



From: Dongjoon Hyun <[hidden email]>
To: dev <[hidden email]>
Date: 01/20/2020 11:27 PM
Subject: [EXTERNAL] Spark 2.4.5 RC2 Preparation Status





Hi, All.

RC2 was scheduled on Today and all RC1 feedbacks seems to be addressed.
However, I'm waiting for another on-going correctness PR.

    https://github.com/apache/spark/pull/27233
    [SPARK-29701][SQL] Correct behaviours of group analytical queries when empty input given

Unlike the other correctness issues (I sent previsouly), this one is active enough to make RC2 fail. As we know, Spark 2.4.5 RC1 vote failed because the correctness patch landed on `master` branch during the RC1 vote period and there was official requests for backporting.

    https://github.com/apache/spark/pull/27229
    [SPARK-29708][SQL][2.4] Correct aggregated values when grouping sets are duplicated

It's risk to start RC2 without considering it because VOTE is also consuming the community resources.

BTW, if there is another on-going notable PR for 2.4.5 RC1, please reply to me.

Thanks,
Dongjoon.


Reply | Threaded
Open this post in threaded view
|

RE: Spark 2.4.5 RC2 Preparation Status

Nicholas Marion

Thanks Dongjoon for the information.

I was just wondering what was blocking the delivery. If there was a major blocker, if the fix I had mentioned (
https://issues.apache.org/jira/browse/SPARK-30310) could get pulled into the 2.4.x branch.

Thanks for any suggestions.


Regards,

NICHOLAS T. MARION
IBM Open Data Analytics for z/OS -
CPO and Service Team Lead

Phone: 1-845-433-5010 | Tie-Line: 293-5010
E-mail:
[hidden email]
Find me on:
LinkedIn: http://www.linkedin.com/in/nicholasmarion
IBM

2455 South Rd
Poughkeepie, New York 12601-5400
United States
IBM Redbooks Silver AuthorData Science Foundations - Level 1



Inactive hide details for Sean Owen ---01/29/2020 01:22:46 PM---I have no opinion - just figuring out the status too. I guess ISean Owen ---01/29/2020 01:22:46 PM---I have no opinion - just figuring out the status too. I guess I'm asking first, is this the only iss

From: Sean Owen <[hidden email]>
To: Dongjoon Hyun <[hidden email]>
Cc: Nicholas Marion <[hidden email]>, dev <[hidden email]>
Date: 01/29/2020 01:22 PM
Subject: [EXTERNAL] Re: Spark 2.4.5 RC2 Preparation Status




I have no opinion - just figuring out the status too.

I guess I'm asking first, is this the only issue in question?

Does nobody object to untargeting it? -> then we are done for 2.4.5, right?
If anyone does -> what's the next step to resolving it?

I wasn't clear from the JIRA / PR, or whether you are arguing it should or should not be untargeted. Or really who is arguing what about it.


On Wed, Jan 29, 2020 at 12:04 PM Dongjoon Hyun <[hidden email]> wrote:
    Great. Sean.

    Then, what is your criteria to remove the targeting it from 2.4.5?

    It doesn't depend on `Who`, right?

    Bests,
    Dongjoon.


    On Wed, Jan 29, 2020 at 9:56 AM Sean Owen <[hidden email]> wrote:
    OK what if anything is in question for 2.4.5? I don't see anything open and targeted for it.
    Are we talking about https://issues.apache.org/jira/browse/SPARK-28344 - targeted for 2.4.5 but not backported, and a 'correctness' issue?
    Simply: who argues this must hold up 2.4.5, and if so what's the status?

    On Wed, Jan 29, 2020 at 11:27 AM Dongjoon Hyun <[hidden email]> wrote:
      Hi, Nicholas and all.

      RC2 is blocked by the community policy on correctness/dataloss issues.

      I cut the RC1 when there were no correctness/dataloss issue targeting on 2.4.5. However, it fails because one correctness issue (target = 3.0.0) is resolved and the community changes the target to 2.4.5 at the last day of RC1 vote.

      As of now, there exists 2.4.5 targeting correctness issue. As a release manager, I cannot cut RC2 until there is no correctness/dataloss issue with target=2.4.5. We need to fix it or we need to move the target version to 2.4.6.

      That's the current situation. I'm trying to follow the existing community policies, but it seems too idealistic for the release. I'm trying to figure out what is the best option for 2.4.5 in the community. Hopefully, we can start RC2 without known risks at least.

      For non-correctness issues, it's up to the progress and decision on them. Those issues are not blockers.

      Bests,
      Dongjoon.

      On Wed, Jan 29, 2020 at 05:39 Nicholas Marion <[hidden email]> wrote:
      Hello,

      Was wondering if RC2 is expected to release soon? Any chance that
      https://issues.apache.org/jira/browse/SPARK-30310 could be added to branch-2.4 as well for 2.4.5 release? Especially since 2.4.x introduced the bug?

Regards,

NICHOLAS T. MARION

IBM Open Data Analytics for z/OS -
CPO and Service Team Lead

Phone: 1-845-433-5010 | Tie-Line: 293-5010
E-mail:
[hidden email]
Find me on:


2455 South Rd
Poughkeepie, New York 12601-5400
United States



      Dongjoon Hyun ---01/20/2020 11:27:19 PM---Hi, All. RC2 was scheduled on Today and all RC1 feedbacks seems to be addressed.



      From:
      Dongjoon Hyun <[hidden email]>
      To:
      dev <[hidden email]>
      Date:
      01/20/2020 11:27 PM
      Subject:
      [EXTERNAL] Spark 2.4.5 RC2 Preparation Status





      Hi, All.

      RC2 was scheduled on Today and all RC1 feedbacks seems to be addressed.
      However, I'm waiting for another on-going correctness PR.

          https://github.com/apache/spark/pull/27233
          [SPARK-29701][SQL] Correct behaviours of group analytical queries when empty input given

      Unlike the other correctness issues (I sent previsouly), this one is active enough to make RC2 fail. As we know, Spark 2.4.5 RC1 vote failed because the correctness patch landed on `master` branch during the RC1 vote period and there was official requests for backporting.

          https://github.com/apache/spark/pull/27229
          [SPARK-29708][SQL][2.4] Correct aggregated values when grouping sets are duplicated

      It's risk to start RC2 without considering it because VOTE is also consuming the community resources.

      BTW, if there is another on-going notable PR for 2.4.5 RC1, please reply to me.

      Thanks,
      Dongjoon.


Reply | Threaded
Open this post in threaded view
|

Re: Spark 2.4.5 RC2 Preparation Status

Dongjoon Hyun-2
Thank you again for helping us make a progress, Sean.

    > Does nobody object to untargeting it? -> then we are done for 2.4.5, right?

I hope so. However, unfortunately, there are more issues without target versions. For example, in the email thread, "Correctness and data loss issues", I asked about the opinion on 2.4.5 specifically, and get the opinion like the following. We didn't reach an agreement on these.

- https://lists.apache.org/thread.html/r8f026c768ed53f3abb11a1b588162864dd9c7fb170ad0941ec33e2d8%40%3Cdev.spark.apache.org%3E

    > SPARK-28125 dataframes created by randomSplit have overlapping rows
    >     Seems like something we should fix
    > SPARK-28067 Incorrect results in decimal aggregation with whole-stage code gen enabled
    >     Seems like we should fix

Here, I'm trying to narrow down our focus to the issues with `Explicit Target Version` and continue to release. In other words, as a release manager, I hope I can officially ignore the other correctness issues which is not targeting to 2.4.5 explicitly.

Most correctness issues are long-standing and cause behavior changes. During maintenance RC vote, for those kind of issues, I hope we set the Target Version `2.4.6` instead of casting a veto RC. It's the same policy with Fix Version. During RC vote period, Fix Version is set to the next version `2.4.6` instead of the current RC `2.4.5`. Since maintenance happens more frequently, I believe that's okay.

To Nicholas, please make a PR if you think so. We will review on there.

Bests,
Dongjoon.


On Wed, Jan 29, 2020 at 11:23 AM Nicholas Marion <[hidden email]> wrote:

Thanks Dongjoon for the information.

I was just wondering what was blocking the delivery. If there was a major blocker, if the fix I had mentioned (
https://issues.apache.org/jira/browse/SPARK-30310) could get pulled into the 2.4.x branch.

Thanks for any suggestions.


Regards,

NICHOLAS T. MARION
IBM Open Data Analytics for z/OS -
CPO and Service Team Lead

Phone: 1-845-433-5010 | Tie-Line: 293-5010
E-mail:
[hidden email]
Find me on:
LinkedIn: http://www.linkedin.com/in/nicholasmarion
IBM

2455 South Rd
Poughkeepie, New York 12601-5400
United States
IBM Redbooks Silver AuthorData Science Foundations - Level 1



Inactive hide details for Sean Owen ---01/29/2020 01:22:46 PM---I have no opinion - just figuring out the status too. I guess ISean Owen ---01/29/2020 01:22:46 PM---I have no opinion - just figuring out the status too. I guess I'm asking first, is this the only iss

From: Sean Owen <[hidden email]>
To: Dongjoon Hyun <[hidden email]>
Cc: Nicholas Marion <[hidden email]>, dev <[hidden email]>
Date: 01/29/2020 01:22 PM
Subject: [EXTERNAL] Re: Spark 2.4.5 RC2 Preparation Status





I have no opinion - just figuring out the status too.

I guess I'm asking first, is this the only issue in question?

Does nobody object to untargeting it? -> then we are done for 2.4.5, right?
If anyone does -> what's the next step to resolving it?

I wasn't clear from the JIRA / PR, or whether you are arguing it should or should not be untargeted. Or really who is arguing what about it.


On Wed, Jan 29, 2020 at 12:04 PM Dongjoon Hyun <[hidden email]> wrote:
    Great. Sean.

    Then, what is your criteria to remove the targeting it from 2.4.5?

    It doesn't depend on `Who`, right?

    Bests,
    Dongjoon.


    On Wed, Jan 29, 2020 at 9:56 AM Sean Owen <[hidden email]> wrote:
    OK what if anything is in question for 2.4.5? I don't see anything open and targeted for it.
    Are we talking about https://issues.apache.org/jira/browse/SPARK-28344 - targeted for 2.4.5 but not backported, and a 'correctness' issue?
    Simply: who argues this must hold up 2.4.5, and if so what's the status?

    On Wed, Jan 29, 2020 at 11:27 AM Dongjoon Hyun <[hidden email]> wrote:
      Hi, Nicholas and all.

      RC2 is blocked by the community policy on correctness/dataloss issues.

      I cut the RC1 when there were no correctness/dataloss issue targeting on 2.4.5. However, it fails because one correctness issue (target = 3.0.0) is resolved and the community changes the target to 2.4.5 at the last day of RC1 vote.

      As of now, there exists 2.4.5 targeting correctness issue. As a release manager, I cannot cut RC2 until there is no correctness/dataloss issue with target=2.4.5. We need to fix it or we need to move the target version to 2.4.6.

      That's the current situation. I'm trying to follow the existing community policies, but it seems too idealistic for the release. I'm trying to figure out what is the best option for 2.4.5 in the community. Hopefully, we can start RC2 without known risks at least.

      For non-correctness issues, it's up to the progress and decision on them. Those issues are not blockers.

      Bests,
      Dongjoon.

      On Wed, Jan 29, 2020 at 05:39 Nicholas Marion <[hidden email]> wrote:
      Hello,

      Was wondering if RC2 is expected to release soon? Any chance that
      https://issues.apache.org/jira/browse/SPARK-30310 could be added to branch-2.4 as well for 2.4.5 release? Especially since 2.4.x introduced the bug?

Regards,

NICHOLAS T. MARION

IBM Open Data Analytics for z/OS -
CPO and Service Team Lead

Phone: 1-845-433-5010 | Tie-Line: 293-5010
E-mail:
[hidden email]
Find me on:


2455 South Rd
Poughkeepie, New York 12601-5400
United States



      Dongjoon Hyun ---01/20/2020 11:27:19 PM---Hi, All. RC2 was scheduled on Today and all RC1 feedbacks seems to be addressed.



      From:
      Dongjoon Hyun <[hidden email]>
      To:
      dev <[hidden email]>
      Date:
      01/20/2020 11:27 PM
      Subject:
      [EXTERNAL] Spark 2.4.5 RC2 Preparation Status





      Hi, All.

      RC2 was scheduled on Today and all RC1 feedbacks seems to be addressed.
      However, I'm waiting for another on-going correctness PR.

          https://github.com/apache/spark/pull/27233
          [SPARK-29701][SQL] Correct behaviours of group analytical queries when empty input given

      Unlike the other correctness issues (I sent previsouly), this one is active enough to make RC2 fail. As we know, Spark 2.4.5 RC1 vote failed because the correctness patch landed on `master` branch during the RC1 vote period and there was official requests for backporting.

          https://github.com/apache/spark/pull/27229
          [SPARK-29708][SQL][2.4] Correct aggregated values when grouping sets are duplicated

      It's risk to start RC2 without considering it because VOTE is also consuming the community resources.

      BTW, if there is another on-going notable PR for 2.4.5 RC1, please reply to me.

      Thanks,
      Dongjoon.


Reply | Threaded
Open this post in threaded view
|

RE: Spark 2.4.5 RC2 Preparation Status

Nicholas Marion

Thanks.

Created:
https://github.com/apache/spark/pull/27384


Regards,

NICHOLAS T. MARION
IBM Open Data Analytics for z/OS -
CPO and Service Team Lead

Phone: 1-845-433-5010 | Tie-Line: 293-5010
E-mail:
[hidden email]
Find me on:
LinkedIn: http://www.linkedin.com/in/nicholasmarion
IBM

2455 South Rd
Poughkeepie, New York 12601-5400
United States
IBM Redbooks Silver AuthorData Science Foundations - Level 1



Inactive hide details for Dongjoon Hyun ---01/29/2020 02:29:40 PM---Thank you again for helping us make a progress, Sean.     >Dongjoon Hyun ---01/29/2020 02:29:40 PM---Thank you again for helping us make a progress, Sean. > Does nobody object to untargeting it? ->

From: Dongjoon Hyun <[hidden email]>
To: Nicholas Marion <[hidden email]>
Cc: Sean Owen <[hidden email]>, dev <[hidden email]>
Date: 01/29/2020 02:29 PM
Subject: [EXTERNAL] Re: Spark 2.4.5 RC2 Preparation Status




Thank you again for helping us make a progress, Sean.

    > Does nobody object to untargeting it? -> then we are done for 2.4.5, right?

I hope so. However, unfortunately, there are more issues without target versions. For example, in the email thread, "Correctness and data loss issues", I asked about the opinion on 2.4.5 specifically, and get the opinion like the following. We didn't reach an agreement on these.

- https://lists.apache.org/thread.html/r8f026c768ed53f3abb11a1b588162864dd9c7fb170ad0941ec33e2d8%40%3Cdev.spark.apache.org%3E

    > SPARK-28125 dataframes created by randomSplit have overlapping rows
    >     Seems like something we should fix
    > SPARK-28067 Incorrect results in decimal aggregation with whole-stage code gen enabled
    >     Seems like we should fix

Here, I'm trying to narrow down our focus to the issues with `Explicit Target Version` and continue to release. In other words, as a release manager, I hope I can officially ignore the other correctness issues which is not targeting to 2.4.5 explicitly.

Most correctness issues are long-standing and cause behavior changes. During maintenance RC vote, for those kind of issues, I hope we set the Target Version `2.4.6` instead of casting a veto RC. It's the same policy with Fix Version. During RC vote period, Fix Version is set to the next version `2.4.6` instead of the current RC `2.4.5`. Since maintenance happens more frequently, I believe that's okay.

To Nicholas, please make a PR if you think so. We will review on there.

Bests,
Dongjoon.


On Wed, Jan 29, 2020 at 11:23 AM Nicholas Marion <[hidden email]> wrote:
    Thanks Dongjoon for the information.

    I was just wondering what was blocking the delivery. If there was a major blocker, if the fix I had mentioned (
    https://issues.apache.org/jira/browse/SPARK-30310) could get pulled into the 2.4.x branch.

    Thanks for any suggestions.


Regards,

NICHOLAS T. MARION

IBM Open Data Analytics for z/OS -
CPO and Service Team Lead

Phone: 1-845-433-5010 | Tie-Line: 293-5010
E-mail:
[hidden email]
Find me on:
LinkedIn: http://www.linkedin.com/in/nicholasmarion
IBM

2455 South Rd
Poughkeepie, New York 12601-5400
United States
IBM Redbooks Silver AuthorData Science Foundations - Level 1



    Inactive hide details for Sean Owen ---01/29/2020 01:22:46 PM---I have no opinion - just figuring out the status too. I guess ISean Owen ---01/29/2020 01:22:46 PM---I have no opinion - just figuring out the status too. I guess I'm asking first, is this the only iss

    From:
    Sean Owen <[hidden email]>
    To:
    Dongjoon Hyun <[hidden email]>
    Cc:
    Nicholas Marion <[hidden email]>, dev <[hidden email]>
    Date:
    01/29/2020 01:22 PM
    Subject:
    [EXTERNAL] Re: Spark 2.4.5 RC2 Preparation Status




    I have no opinion - just figuring out the status too.

    I guess I'm asking first, is this the only issue in question?

    Does nobody object to untargeting it? -> then we are done for 2.4.5, right?
    If anyone does -> what's the next step to resolving it?

    I wasn't clear from the JIRA / PR, or whether you are arguing it should or should not be untargeted. Or really who is arguing what about it.


    On Wed, Jan 29, 2020 at 12:04 PM Dongjoon Hyun <[hidden email]> wrote:
        Great. Sean.

        Then, what is your criteria to remove the targeting it from 2.4.5?

        It doesn't depend on `Who`, right?

        Bests,
        Dongjoon.


        On Wed, Jan 29, 2020 at 9:56 AM Sean Owen <[hidden email]> wrote:
        OK what if anything is in question for 2.4.5? I don't see anything open and targeted for it.
        Are we talking about https://issues.apache.org/jira/browse/SPARK-28344 - targeted for 2.4.5 but not backported, and a 'correctness' issue?
        Simply: who argues this must hold up 2.4.5, and if so what's the status?

        On Wed, Jan 29, 2020 at 11:27 AM Dongjoon Hyun <[hidden email]> wrote:
            Hi, Nicholas and all.

            RC2 is blocked by the community policy on correctness/dataloss issues.

            I cut the RC1 when there were no correctness/dataloss issue targeting on 2.4.5. However, it fails because one correctness issue (target = 3.0.0) is resolved and the community changes the target to 2.4.5 at the last day of RC1 vote.

            As of now, there exists 2.4.5 targeting correctness issue. As a release manager, I cannot cut RC2 until there is no correctness/dataloss issue with target=2.4.5. We need to fix it or we need to move the target version to 2.4.6.

            That's the current situation. I'm trying to follow the existing community policies, but it seems too idealistic for the release. I'm trying to figure out what is the best option for 2.4.5 in the community. Hopefully, we can start RC2 without known risks at least.

            For non-correctness issues, it's up to the progress and decision on them. Those issues are not blockers.

            Bests,
            Dongjoon.

            On Wed, Jan 29, 2020 at 05:39 Nicholas Marion <[hidden email]> wrote:
            Hello,

            Was wondering if RC2 is expected to release soon? Any chance that
            https://issues.apache.org/jira/browse/SPARK-30310 could be added to branch-2.4 as well for 2.4.5 release? Especially since 2.4.x introduced the bug?
            Regards,

            NICHOLAS T. MARION

            IBM Open Data Analytics for z/OS -
            CPO and Service Team Lead

Phone: 1-845-433-5010 | Tie-Line: 293-5010
E-mail:
[hidden email]
Find me on:


2455 South Rd
Poughkeepie, New York 12601-5400
United States



            Dongjoon Hyun ---01/20/2020 11:27:19 PM---Hi, All. RC2 was scheduled on Today and all RC1 feedbacks seems to be addressed.



            From:
            Dongjoon Hyun <[hidden email]>
            To:
            dev <[hidden email]>
            Date:
            01/20/2020 11:27 PM
            Subject:
            [EXTERNAL] Spark 2.4.5 RC2 Preparation Status





            Hi, All.

            RC2 was scheduled on Today and all RC1 feedbacks seems to be addressed.
            However, I'm waiting for another on-going correctness PR.

                https://github.com/apache/spark/pull/27233
                [SPARK-29701][SQL] Correct behaviours of group analytical queries when empty input given

            Unlike the other correctness issues (I sent previsouly), this one is active enough to make RC2 fail. As we know, Spark 2.4.5 RC1 vote failed because the correctness patch landed on `master` branch during the RC1 vote period and there was official requests for backporting.

                https://github.com/apache/spark/pull/27229
                [SPARK-29708][SQL][2.4] Correct aggregated values when grouping sets are duplicated

            It's risk to start RC2 without considering it because VOTE is also consuming the community resources.

            BTW, if there is another on-going notable PR for 2.4.5 RC1, please reply to me.

            Thanks,
            Dongjoon.


Reply | Threaded
Open this post in threaded view
|

Re: Spark 2.4.5 RC2 Preparation Status

Sean Owen-2
In reply to this post by Dongjoon Hyun-2
OK, that's specific. It's always a judgment call whether to hold the release train for one more fix or not. Depends on how impactful it is (harm of releasing without it), and how big it is (harm of delaying release of other fixes further). I think we tend to weight regressions from a previous 2.4.x release more heavily; those are typically Blockers, otherwise not. Otherwise once RCs start, we're driving primarily to a no-Blocker release. The default should be to punt to 2.4.6 -- which can come relatively soon if one wants.

SPARK-28125 is not even a bug, I'd argue, let alone Blocker. Looks like it was marked 'correctness' by the reporter. It's always been the case since Spark 1.0 (i.e. not a regression) that RDDs need to be deterministic for most of the semantics one expects to work out. If it isn't, many bets are off. I get that this is a 'gotcha', but it isn't even about the randomSplit. If anything recomputes the RDD, it could be different.

SPARK-28067, I don't know anything about, but also is being reported as not a 2.4.x regression, and I don't see anyone working on it. For that reason, not sure it's a Blocker for 2.4.x.

SPARK-30310 is not a 2.4.x regression either, nor particularly critical IMHO. Doesn't mean we can't back-port it to 2.4 though, and it's 'done' (in master)

Anything else? not according to JIRA at least.

I think it's valid to continue with RC2 assuming none of these are necessary for 2.4.5. 
It's not wrong to 'wait' if there are strong feelings about something, but, if we can't see a reason to expect the situation changes in a week, 2 weeks, then, why? The release of 2.4.5 nowish doesn't necessarily make the release of said fix much further away -- in 2.4.6.

On Wed, Jan 29, 2020 at 1:28 PM Dongjoon Hyun <[hidden email]> wrote:
    > SPARK-28125 dataframes created by randomSplit have overlapping rows
    >     Seems like something we should fix
    > SPARK-28067 Incorrect results in decimal aggregation with whole-stage code gen enabled
    >     Seems like we should fix

Here, I'm trying to narrow down our focus to the issues with `Explicit Target Version` and continue to release. In other words, as a release manager, I hope I can officially ignore the other correctness issues which is not targeting to 2.4.5 explicitly.

Most correctness issues are long-standing and cause behavior changes. During maintenance RC vote, for those kind of issues, I hope we set the Target Version `2.4.6` instead of casting a veto RC. It's the same policy with Fix Version. During RC vote period, Fix Version is set to the next version `2.4.6` instead of the current RC `2.4.5`. Since maintenance happens more frequently, I believe that's okay.


47517844.jpg (710 bytes) Download Attachment
47472433.gif (2K) Download Attachment
ecblank.gif (60 bytes) Download Attachment
47369649.gif (69K) Download Attachment
47849618.gif (80K) Download Attachment
graycol.gif (142 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Spark 2.4.5 RC2 Preparation Status

Dongjoon Hyun-2
Thanks, Sean.

If there is no further objection to the mailing list,
could you remove the `Target Version: 2.4.5` from the followings?

    SPARK-28344 Fail the query if detect ambiguous self join
    SPARK-29578 JDK 1.8.0_232 timezone updates cause "Kwajalein" test failures again

Then, after the regular RC preparation testing including the manual integration tests,
I can roll 2.4.5 RC2 next Monday (Feb. 3rd, PST) and all late blocker patches will block 2.4.6 instead of causing RC failure.

Bests,
Dongjoon.


On Wed, Jan 29, 2020 at 12:16 PM Sean Owen <[hidden email]> wrote:
OK, that's specific. It's always a judgment call whether to hold the release train for one more fix or not. Depends on how impactful it is (harm of releasing without it), and how big it is (harm of delaying release of other fixes further). I think we tend to weight regressions from a previous 2.4.x release more heavily; those are typically Blockers, otherwise not. Otherwise once RCs start, we're driving primarily to a no-Blocker release. The default should be to punt to 2.4.6 -- which can come relatively soon if one wants.

SPARK-28125 is not even a bug, I'd argue, let alone Blocker. Looks like it was marked 'correctness' by the reporter. It's always been the case since Spark 1.0 (i.e. not a regression) that RDDs need to be deterministic for most of the semantics one expects to work out. If it isn't, many bets are off. I get that this is a 'gotcha', but it isn't even about the randomSplit. If anything recomputes the RDD, it could be different.

SPARK-28067, I don't know anything about, but also is being reported as not a 2.4.x regression, and I don't see anyone working on it. For that reason, not sure it's a Blocker for 2.4.x.

SPARK-30310 is not a 2.4.x regression either, nor particularly critical IMHO. Doesn't mean we can't back-port it to 2.4 though, and it's 'done' (in master)

Anything else? not according to JIRA at least.

I think it's valid to continue with RC2 assuming none of these are necessary for 2.4.5. 
It's not wrong to 'wait' if there are strong feelings about something, but, if we can't see a reason to expect the situation changes in a week, 2 weeks, then, why? The release of 2.4.5 nowish doesn't necessarily make the release of said fix much further away -- in 2.4.6.

On Wed, Jan 29, 2020 at 1:28 PM Dongjoon Hyun <[hidden email]> wrote:
    > SPARK-28125 dataframes created by randomSplit have overlapping rows
    >     Seems like something we should fix
    > SPARK-28067 Incorrect results in decimal aggregation with whole-stage code gen enabled
    >     Seems like we should fix

Here, I'm trying to narrow down our focus to the issues with `Explicit Target Version` and continue to release. In other words, as a release manager, I hope I can officially ignore the other correctness issues which is not targeting to 2.4.5 explicitly.

Most correctness issues are long-standing and cause behavior changes. During maintenance RC vote, for those kind of issues, I hope we set the Target Version `2.4.6` instead of casting a veto RC. It's the same policy with Fix Version. During RC vote period, Fix Version is set to the next version `2.4.6` instead of the current RC `2.4.5`. Since maintenance happens more frequently, I believe that's okay.

Reply | Threaded
Open this post in threaded view
|

Re: Spark 2.4.5 RC2 Preparation Status

Sean Owen-2
OK, we can wait a tick to confirm there aren't strong objections.
I suppose I'd prefer someone who knows
https://issues.apache.org/jira/browse/SPARK-28344 to confirm it was
either erroneously targeted to 2.4, or else it's valid, but, not
critical for the RC. Hearing nothing else shortly, I'd untarget it.

SPARK-29578 is a tiny low-risk test change but probably worth picking
up to avoid failing on certain JDKs during testing. I'll make a
back-port, as this should be noncontroversial. (Not sure why I didn't
backport originally)

On Wed, Jan 29, 2020 at 3:27 PM Dongjoon Hyun <[hidden email]> wrote:

>
> Thanks, Sean.
>
> If there is no further objection to the mailing list,
> could you remove the `Target Version: 2.4.5` from the followings?
>
>     SPARK-28344 Fail the query if detect ambiguous self join
>     SPARK-29578 JDK 1.8.0_232 timezone updates cause "Kwajalein" test failures again
>
> Then, after the regular RC preparation testing including the manual integration tests,
> I can roll 2.4.5 RC2 next Monday (Feb. 3rd, PST) and all late blocker patches will block 2.4.6 instead of causing RC failure.
>
> Bests,
> Dongjoon.
>
>
> On Wed, Jan 29, 2020 at 12:16 PM Sean Owen <[hidden email]> wrote:
>>
>> OK, that's specific. It's always a judgment call whether to hold the release train for one more fix or not. Depends on how impactful it is (harm of releasing without it), and how big it is (harm of delaying release of other fixes further). I think we tend to weight regressions from a previous 2.4.x release more heavily; those are typically Blockers, otherwise not. Otherwise once RCs start, we're driving primarily to a no-Blocker release. The default should be to punt to 2.4.6 -- which can come relatively soon if one wants.
>>
>> SPARK-28125 is not even a bug, I'd argue, let alone Blocker. Looks like it was marked 'correctness' by the reporter. It's always been the case since Spark 1.0 (i.e. not a regression) that RDDs need to be deterministic for most of the semantics one expects to work out. If it isn't, many bets are off. I get that this is a 'gotcha', but it isn't even about the randomSplit. If anything recomputes the RDD, it could be different.
>>
>> SPARK-28067, I don't know anything about, but also is being reported as not a 2.4.x regression, and I don't see anyone working on it. For that reason, not sure it's a Blocker for 2.4.x.
>>
>> SPARK-30310 is not a 2.4.x regression either, nor particularly critical IMHO. Doesn't mean we can't back-port it to 2.4 though, and it's 'done' (in master)
>>
>> Anything else? not according to JIRA at least.
>>
>> I think it's valid to continue with RC2 assuming none of these are necessary for 2.4.5.
>> It's not wrong to 'wait' if there are strong feelings about something, but, if we can't see a reason to expect the situation changes in a week, 2 weeks, then, why? The release of 2.4.5 nowish doesn't necessarily make the release of said fix much further away -- in 2.4.6.
>>
>> On Wed, Jan 29, 2020 at 1:28 PM Dongjoon Hyun <[hidden email]> wrote:
>>>
>>>     > SPARK-28125 dataframes created by randomSplit have overlapping rows
>>>     >     Seems like something we should fix
>>>     > SPARK-28067 Incorrect results in decimal aggregation with whole-stage code gen enabled
>>>     >     Seems like we should fix
>>>
>>> Here, I'm trying to narrow down our focus to the issues with `Explicit Target Version` and continue to release. In other words, as a release manager, I hope I can officially ignore the other correctness issues which is not targeting to 2.4.5 explicitly.
>>>
>>> Most correctness issues are long-standing and cause behavior changes. During maintenance RC vote, for those kind of issues, I hope we set the Target Version `2.4.6` instead of casting a veto RC. It's the same policy with Fix Version. During RC vote period, Fix Version is set to the next version `2.4.6` instead of the current RC `2.4.5`. Since maintenance happens more frequently, I believe that's okay.
>>>>
>>>>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark 2.4.5 RC2 Preparation Status

Dongjoon Hyun-2
Got it. Thanks! 

Bests,
Dongjoon.

On Wed, Jan 29, 2020 at 1:40 PM Sean Owen <[hidden email]> wrote:
OK, we can wait a tick to confirm there aren't strong objections.
I suppose I'd prefer someone who knows
https://issues.apache.org/jira/browse/SPARK-28344 to confirm it was
either erroneously targeted to 2.4, or else it's valid, but, not
critical for the RC. Hearing nothing else shortly, I'd untarget it.

SPARK-29578 is a tiny low-risk test change but probably worth picking
up to avoid failing on certain JDKs during testing. I'll make a
back-port, as this should be noncontroversial. (Not sure why I didn't
backport originally)

On Wed, Jan 29, 2020 at 3:27 PM Dongjoon Hyun <[hidden email]> wrote:
>
> Thanks, Sean.
>
> If there is no further objection to the mailing list,
> could you remove the `Target Version: 2.4.5` from the followings?
>
>     SPARK-28344 Fail the query if detect ambiguous self join
>     SPARK-29578 JDK 1.8.0_232 timezone updates cause "Kwajalein" test failures again
>
> Then, after the regular RC preparation testing including the manual integration tests,
> I can roll 2.4.5 RC2 next Monday (Feb. 3rd, PST) and all late blocker patches will block 2.4.6 instead of causing RC failure.
>
> Bests,
> Dongjoon.
>
>
> On Wed, Jan 29, 2020 at 12:16 PM Sean Owen <[hidden email]> wrote:
>>
>> OK, that's specific. It's always a judgment call whether to hold the release train for one more fix or not. Depends on how impactful it is (harm of releasing without it), and how big it is (harm of delaying release of other fixes further). I think we tend to weight regressions from a previous 2.4.x release more heavily; those are typically Blockers, otherwise not. Otherwise once RCs start, we're driving primarily to a no-Blocker release. The default should be to punt to 2.4.6 -- which can come relatively soon if one wants.
>>
>> SPARK-28125 is not even a bug, I'd argue, let alone Blocker. Looks like it was marked 'correctness' by the reporter. It's always been the case since Spark 1.0 (i.e. not a regression) that RDDs need to be deterministic for most of the semantics one expects to work out. If it isn't, many bets are off. I get that this is a 'gotcha', but it isn't even about the randomSplit. If anything recomputes the RDD, it could be different.
>>
>> SPARK-28067, I don't know anything about, but also is being reported as not a 2.4.x regression, and I don't see anyone working on it. For that reason, not sure it's a Blocker for 2.4.x.
>>
>> SPARK-30310 is not a 2.4.x regression either, nor particularly critical IMHO. Doesn't mean we can't back-port it to 2.4 though, and it's 'done' (in master)
>>
>> Anything else? not according to JIRA at least.
>>
>> I think it's valid to continue with RC2 assuming none of these are necessary for 2.4.5.
>> It's not wrong to 'wait' if there are strong feelings about something, but, if we can't see a reason to expect the situation changes in a week, 2 weeks, then, why? The release of 2.4.5 nowish doesn't necessarily make the release of said fix much further away -- in 2.4.6.
>>
>> On Wed, Jan 29, 2020 at 1:28 PM Dongjoon Hyun <[hidden email]> wrote:
>>>
>>>     > SPARK-28125 dataframes created by randomSplit have overlapping rows
>>>     >     Seems like something we should fix
>>>     > SPARK-28067 Incorrect results in decimal aggregation with whole-stage code gen enabled
>>>     >     Seems like we should fix
>>>
>>> Here, I'm trying to narrow down our focus to the issues with `Explicit Target Version` and continue to release. In other words, as a release manager, I hope I can officially ignore the other correctness issues which is not targeting to 2.4.5 explicitly.
>>>
>>> Most correctness issues are long-standing and cause behavior changes. During maintenance RC vote, for those kind of issues, I hope we set the Target Version `2.4.6` instead of casting a veto RC. It's the same policy with Fix Version. During RC vote period, Fix Version is set to the next version `2.4.6` instead of the current RC `2.4.5`. Since maintenance happens more frequently, I believe that's okay.
>>>>
>>>>