Resolving all JIRAs affecting EOL releases

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Resolving all JIRAs affecting EOL releases

Hyukjin Kwon
Hi all,

I would like to propose to resolve all JIRAs that affects EOL releases - 2.2 and below. and affected version
not specified. I was rather against this way and considered this as last resort in roughly 3 years ago
when we discussed. Now I think we should go ahead with this. See below.

I have been talking care of this for so long time almost every day those 3 years. The number of JIRAs
keeps increasing and it does never go down. Now the number is going over 2500 JIRAs. 
Did you guys know? in JIRA, we can only go through page by page up to 1000 items. So, currently we're even
having difficulties to go through every JIRA. We should manually filter out and check each.
The number is going over the manageable size.

I am not suggesting this without anything actually trying. This is what we have tried within my visibility:

  1. In roughly 3 years ago, Sean tried to gather committers and even non-committers people to sort
    out this number. At that time, we were only able to keep this number as is. After we lost this momentum,
    it kept increasing back.
  2. At least I scanned _all_ the previous JIRAs at least more than two times and resolved them. Roughly
    once a year. The rest of them are mostly obsolete but not enough information to investigate further.
  3. I strictly stick to "Contributing to JIRA Maintenance" https://spark.apache.org/contributing.html and
    resolve JIRAs.
  4. Promoting other people to comment on JIRA or actively resolve them.

One of the facts I realised is the increasing number of committers doesn't virtually help this much (although
it might be helpful if somebody active in JIRA becomes a committer.)

One of the important thing I should note is that, it's now almost pretty difficult to reproduce and test the
issues found in EOL releases. We should git clone, checkout, build and test. And then, see if that issue
still exists in upstream, and fix. This is non-trivial overhead.

Therefore, I would like to propose resolving _all_ the JIRAs that targets EOL releases - 2.2 and below.
Please let me know if anyone has some concerns or objections.

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Resolving all JIRAs affecting EOL releases

Abdeali Kothari
Was thinking that getting an estimated statistic of the number of issues that would be closed if this is done would help.

Open issues: 3882 (project = SPARK AND status in (Open, "In Progress", Reopened))
Open + Does not affect 3.0+ = 2795
Open + Does not affect 2.4+ = 2373
Open + Does not affect 2.3+ = 1765
Open + Does not affect 2.2+ = 1322
Open + Does not affect 2.1+ = 967
Open + Does not affect 2.0+ = 651

Open + Does not affect 2.0+ + Priority in (Urgent, Blocker, Critical, High) [JQL1] = 838
Open + Does not affect 2.0+ + Priority in (Urgent, Blocker, Critical, High, Major) = 206
Open + Does not affect 2.2+ + Priority not in (Urgent, Blocker, Critical, High) [JQL2] = 1303
Open + Does not affect 2.2+ + Priority not in (Urgent, Blocker, Critical, High, Major) = 397
Open + Does not affect 2.3+ + Priority not in (Urgent, Blocker, Critical, High) = 1743
Open + Does not affect 2.3+ + Priority not in (Urgent, Blocker, Critical, High, Major) = 550

Resolving ALL seems a bit overkill to me.
My current opinion seems like:
 - Resolving "Open + Does not affect 2.0+" is something that should be done, as I doubt anyone would be looking at the 1.x versions anymore (651 tasks)
 - Resolving "Open + Does not affect 2.3+ + Priority not in (Urgent, Blocker, Critical, High, Major)" may be a good idea (an additional ~1k tasks)
The issues with priority Urgent/Blocker/Critical should be triaged - as it may have something important.


[JQL1]:
project = SPARK 
 AND status in (Open, "In Progress", Reopened) 
 AND NOT (affectedVersion in versionMatch("^[2-3].*"))
 AND priority NOT IN (Urgent, Blocker, Critical, High)

[JQL2]:
project = SPARK 
 AND status in (Open, "In Progress", Reopened) 
 AND NOT (affectedVersion in versionMatch("^3.*") OR affectedVersion in versionMatch("^2.4.*") OR affectedVersion in versionMatch("^2.3.*") OR affectedVersion in versionMatch("^2.2.*"))
 AND priority NOT IN (Urgent, Blocker, Critical, High)


On Wed, May 15, 2019, 14:55 Hyukjin Kwon <[hidden email]> wrote:
Hi all,

I would like to propose to resolve all JIRAs that affects EOL releases - 2.2 and below. and affected version
not specified. I was rather against this way and considered this as last resort in roughly 3 years ago
when we discussed. Now I think we should go ahead with this. See below.

I have been talking care of this for so long time almost every day those 3 years. The number of JIRAs
keeps increasing and it does never go down. Now the number is going over 2500 JIRAs. 
Did you guys know? in JIRA, we can only go through page by page up to 1000 items. So, currently we're even
having difficulties to go through every JIRA. We should manually filter out and check each.
The number is going over the manageable size.

I am not suggesting this without anything actually trying. This is what we have tried within my visibility:

  1. In roughly 3 years ago, Sean tried to gather committers and even non-committers people to sort
    out this number. At that time, we were only able to keep this number as is. After we lost this momentum,
    it kept increasing back.
  2. At least I scanned _all_ the previous JIRAs at least more than two times and resolved them. Roughly
    once a year. The rest of them are mostly obsolete but not enough information to investigate further.
  3. I strictly stick to "Contributing to JIRA Maintenance" https://spark.apache.org/contributing.html and
    resolve JIRAs.
  4. Promoting other people to comment on JIRA or actively resolve them.

One of the facts I realised is the increasing number of committers doesn't virtually help this much (although
it might be helpful if somebody active in JIRA becomes a committer.)

One of the important thing I should note is that, it's now almost pretty difficult to reproduce and test the
issues found in EOL releases. We should git clone, checkout, build and test. And then, see if that issue
still exists in upstream, and fix. This is non-trivial overhead.

Therefore, I would like to propose resolving _all_ the JIRAs that targets EOL releases - 2.2 and below.
Please let me know if anyone has some concerns or objections.

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Resolving all JIRAs affecting EOL releases

Hyukjin Kwon
Yea, more sophisticated condition is welcome. My only goal is to make it to a manageable size.

I would go for the option that reduces more tickets - under 1000 OPEN (and REOPEN) tickets so that we can at least go through in one go without coming up with a non duplicating filter to go through.

On Wed, 15 May 2019, 19:33 Abdeali Kothari, <[hidden email]> wrote:
Was thinking that getting an estimated statistic of the number of issues that would be closed if this is done would help.

Open issues: 3882 (project = SPARK AND status in (Open, "In Progress", Reopened))
Open + Does not affect 3.0+ = 2795
Open + Does not affect 2.4+ = 2373
Open + Does not affect 2.3+ = 1765
Open + Does not affect 2.2+ = 1322
Open + Does not affect 2.1+ = 967
Open + Does not affect 2.0+ = 651

Open + Does not affect 2.0+ + Priority in (Urgent, Blocker, Critical, High) [JQL1] = 838
Open + Does not affect 2.0+ + Priority in (Urgent, Blocker, Critical, High, Major) = 206
Open + Does not affect 2.2+ + Priority not in (Urgent, Blocker, Critical, High) [JQL2] = 1303
Open + Does not affect 2.2+ + Priority not in (Urgent, Blocker, Critical, High, Major) = 397
Open + Does not affect 2.3+ + Priority not in (Urgent, Blocker, Critical, High) = 1743
Open + Does not affect 2.3+ + Priority not in (Urgent, Blocker, Critical, High, Major) = 550

Resolving ALL seems a bit overkill to me.
My current opinion seems like:
 - Resolving "Open + Does not affect 2.0+" is something that should be done, as I doubt anyone would be looking at the 1.x versions anymore (651 tasks)
 - Resolving "Open + Does not affect 2.3+ + Priority not in (Urgent, Blocker, Critical, High, Major)" may be a good idea (an additional ~1k tasks)
The issues with priority Urgent/Blocker/Critical should be triaged - as it may have something important.


[JQL1]:
project = SPARK 
 AND status in (Open, "In Progress", Reopened) 
 AND NOT (affectedVersion in versionMatch("^[2-3].*"))
 AND priority NOT IN (Urgent, Blocker, Critical, High)

[JQL2]:
project = SPARK 
 AND status in (Open, "In Progress", Reopened) 
 AND NOT (affectedVersion in versionMatch("^3.*") OR affectedVersion in versionMatch("^2.4.*") OR affectedVersion in versionMatch("^2.3.*") OR affectedVersion in versionMatch("^2.2.*"))
 AND priority NOT IN (Urgent, Blocker, Critical, High)


On Wed, May 15, 2019, 14:55 Hyukjin Kwon <[hidden email]> wrote:
Hi all,

I would like to propose to resolve all JIRAs that affects EOL releases - 2.2 and below. and affected version
not specified. I was rather against this way and considered this as last resort in roughly 3 years ago
when we discussed. Now I think we should go ahead with this. See below.

I have been talking care of this for so long time almost every day those 3 years. The number of JIRAs
keeps increasing and it does never go down. Now the number is going over 2500 JIRAs. 
Did you guys know? in JIRA, we can only go through page by page up to 1000 items. So, currently we're even
having difficulties to go through every JIRA. We should manually filter out and check each.
The number is going over the manageable size.

I am not suggesting this without anything actually trying. This is what we have tried within my visibility:

  1. In roughly 3 years ago, Sean tried to gather committers and even non-committers people to sort
    out this number. At that time, we were only able to keep this number as is. After we lost this momentum,
    it kept increasing back.
  2. At least I scanned _all_ the previous JIRAs at least more than two times and resolved them. Roughly
    once a year. The rest of them are mostly obsolete but not enough information to investigate further.
  3. I strictly stick to "Contributing to JIRA Maintenance" https://spark.apache.org/contributing.html and
    resolve JIRAs.
  4. Promoting other people to comment on JIRA or actively resolve them.

One of the facts I realised is the increasing number of committers doesn't virtually help this much (although
it might be helpful if somebody active in JIRA becomes a committer.)

One of the important thing I should note is that, it's now almost pretty difficult to reproduce and test the
issues found in EOL releases. We should git clone, checkout, build and test. And then, see if that issue
still exists in upstream, and fix. This is non-trivial overhead.

Therefore, I would like to propose resolving _all_ the JIRAs that targets EOL releases - 2.2 and below.
Please let me know if anyone has some concerns or objections.

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Resolving all JIRAs affecting EOL releases

Sean Owen-2
In reply to this post by Hyukjin Kwon
I gave up looking through JIRAs a long time ago, so, big respect for
continuing to try to triage them. I am afraid we're missing a few
important bug reports in the torrent, but most JIRAs are not
well-formed, just questions, stale, or simply things that won't be
added. I do think it's important to reflect that reality, and so I'm
always in favor of more aggressively closing JIRAs. I think this is
more standard practice, from projects like TensorFlow/Keras, pandas,
etc to just automatically drop Issues that don't see activity for N
days. We won't do that, but, are probably on the other hand far too
lax in closing them.

Remember that JIRAs stay searchable and can be reopened, so it's not
like we lose much information.

I'd close anything that hasn't had activity in 2 years (?), as a start.
I like the idea of closing things that only affect an EOL release,
but, many items aren't marked, so may need to cast the net wider.

I think only then does it make sense to look at bothering to reproduce
or evaluate the 1000s that will still remain.

On Wed, May 15, 2019 at 4:25 AM Hyukjin Kwon <[hidden email]> wrote:

>
> Hi all,
>
> I would like to propose to resolve all JIRAs that affects EOL releases - 2.2 and below. and affected version
> not specified. I was rather against this way and considered this as last resort in roughly 3 years ago
> when we discussed. Now I think we should go ahead with this. See below.
>
> I have been talking care of this for so long time almost every day those 3 years. The number of JIRAs
> keeps increasing and it does never go down. Now the number is going over 2500 JIRAs.
> Did you guys know? in JIRA, we can only go through page by page up to 1000 items. So, currently we're even
> having difficulties to go through every JIRA. We should manually filter out and check each.
> The number is going over the manageable size.
>
> I am not suggesting this without anything actually trying. This is what we have tried within my visibility:
>
>   1. In roughly 3 years ago, Sean tried to gather committers and even non-committers people to sort
>     out this number. At that time, we were only able to keep this number as is. After we lost this momentum,
>     it kept increasing back.
>   2. At least I scanned _all_ the previous JIRAs at least more than two times and resolved them. Roughly
>     once a year. The rest of them are mostly obsolete but not enough information to investigate further.
>   3. I strictly stick to "Contributing to JIRA Maintenance" https://spark.apache.org/contributing.html and
>     resolve JIRAs.
>   4. Promoting other people to comment on JIRA or actively resolve them.
>
> One of the facts I realised is the increasing number of committers doesn't virtually help this much (although
> it might be helpful if somebody active in JIRA becomes a committer.)
>
> One of the important thing I should note is that, it's now almost pretty difficult to reproduce and test the
> issues found in EOL releases. We should git clone, checkout, build and test. And then, see if that issue
> still exists in upstream, and fix. This is non-trivial overhead.
>
> Therefore, I would like to propose resolving _all_ the JIRAs that targets EOL releases - 2.2 and below.
> Please let me know if anyone has some concerns or objections.
>
> Thanks.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Resolving all JIRAs affecting EOL releases

Josh Rosen
+1 in favor of some sort of JIRA cleanup. 

My only request is that we attach some sort of 'bulk-closed' label to issues that we close via JIRA filter batch operations (and resolve the issues as "Timed Out" / "Cannot Reproduce", not "Fixed"). Using a label makes it easier to audit what was closed, simplifying the process of identifying and re-opening valid issues caught in our dragnet.


On Wed, May 15, 2019 at 7:19 AM Sean Owen <[hidden email]> wrote:
I gave up looking through JIRAs a long time ago, so, big respect for
continuing to try to triage them. I am afraid we're missing a few
important bug reports in the torrent, but most JIRAs are not
well-formed, just questions, stale, or simply things that won't be
added. I do think it's important to reflect that reality, and so I'm
always in favor of more aggressively closing JIRAs. I think this is
more standard practice, from projects like TensorFlow/Keras, pandas,
etc to just automatically drop Issues that don't see activity for N
days. We won't do that, but, are probably on the other hand far too
lax in closing them.

Remember that JIRAs stay searchable and can be reopened, so it's not
like we lose much information.

I'd close anything that hasn't had activity in 2 years (?), as a start.
I like the idea of closing things that only affect an EOL release,
but, many items aren't marked, so may need to cast the net wider.

I think only then does it make sense to look at bothering to reproduce
or evaluate the 1000s that will still remain.

On Wed, May 15, 2019 at 4:25 AM Hyukjin Kwon <[hidden email]> wrote:
>
> Hi all,
>
> I would like to propose to resolve all JIRAs that affects EOL releases - 2.2 and below. and affected version
> not specified. I was rather against this way and considered this as last resort in roughly 3 years ago
> when we discussed. Now I think we should go ahead with this. See below.
>
> I have been talking care of this for so long time almost every day those 3 years. The number of JIRAs
> keeps increasing and it does never go down. Now the number is going over 2500 JIRAs.
> Did you guys know? in JIRA, we can only go through page by page up to 1000 items. So, currently we're even
> having difficulties to go through every JIRA. We should manually filter out and check each.
> The number is going over the manageable size.
>
> I am not suggesting this without anything actually trying. This is what we have tried within my visibility:
>
>   1. In roughly 3 years ago, Sean tried to gather committers and even non-committers people to sort
>     out this number. At that time, we were only able to keep this number as is. After we lost this momentum,
>     it kept increasing back.
>   2. At least I scanned _all_ the previous JIRAs at least more than two times and resolved them. Roughly
>     once a year. The rest of them are mostly obsolete but not enough information to investigate further.
>   3. I strictly stick to "Contributing to JIRA Maintenance" https://spark.apache.org/contributing.html and
>     resolve JIRAs.
>   4. Promoting other people to comment on JIRA or actively resolve them.
>
> One of the facts I realised is the increasing number of committers doesn't virtually help this much (although
> it might be helpful if somebody active in JIRA becomes a committer.)
>
> One of the important thing I should note is that, it's now almost pretty difficult to reproduce and test the
> issues found in EOL releases. We should git clone, checkout, build and test. And then, see if that issue
> still exists in upstream, and fix. This is non-trivial overhead.
>
> Therefore, I would like to propose resolving _all_ the JIRAs that targets EOL releases - 2.2 and below.
> Please let me know if anyone has some concerns or objections.
>
> Thanks.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Resolving all JIRAs affecting EOL releases

Hyukjin Kwon
BTW, affected version became a required field (I don't remember when exactly was .. I believe it's around when we work on Spark 2.3):

Screen Shot 2019-05-16 at 10.29.50 AM.png

So, including all EOL versions and affected versions not specified will roughly work.
Using "Cannot Reproduce" as its status and 'bulk-closed' label makes the best sense to me.

Okie. I want to open this roughly for a week before taking an actual action for this. If there's no more feedback, I will do as I said ^ next week.


2019년 5월 15일 (수) 오후 11:33, Josh Rosen <[hidden email]>님이 작성:
+1 in favor of some sort of JIRA cleanup. 

My only request is that we attach some sort of 'bulk-closed' label to issues that we close via JIRA filter batch operations (and resolve the issues as "Timed Out" / "Cannot Reproduce", not "Fixed"). Using a label makes it easier to audit what was closed, simplifying the process of identifying and re-opening valid issues caught in our dragnet.


On Wed, May 15, 2019 at 7:19 AM Sean Owen <[hidden email]> wrote:
I gave up looking through JIRAs a long time ago, so, big respect for
continuing to try to triage them. I am afraid we're missing a few
important bug reports in the torrent, but most JIRAs are not
well-formed, just questions, stale, or simply things that won't be
added. I do think it's important to reflect that reality, and so I'm
always in favor of more aggressively closing JIRAs. I think this is
more standard practice, from projects like TensorFlow/Keras, pandas,
etc to just automatically drop Issues that don't see activity for N
days. We won't do that, but, are probably on the other hand far too
lax in closing them.

Remember that JIRAs stay searchable and can be reopened, so it's not
like we lose much information.

I'd close anything that hasn't had activity in 2 years (?), as a start.
I like the idea of closing things that only affect an EOL release,
but, many items aren't marked, so may need to cast the net wider.

I think only then does it make sense to look at bothering to reproduce
or evaluate the 1000s that will still remain.

On Wed, May 15, 2019 at 4:25 AM Hyukjin Kwon <[hidden email]> wrote:
>
> Hi all,
>
> I would like to propose to resolve all JIRAs that affects EOL releases - 2.2 and below. and affected version
> not specified. I was rather against this way and considered this as last resort in roughly 3 years ago
> when we discussed. Now I think we should go ahead with this. See below.
>
> I have been talking care of this for so long time almost every day those 3 years. The number of JIRAs
> keeps increasing and it does never go down. Now the number is going over 2500 JIRAs.
> Did you guys know? in JIRA, we can only go through page by page up to 1000 items. So, currently we're even
> having difficulties to go through every JIRA. We should manually filter out and check each.
> The number is going over the manageable size.
>
> I am not suggesting this without anything actually trying. This is what we have tried within my visibility:
>
>   1. In roughly 3 years ago, Sean tried to gather committers and even non-committers people to sort
>     out this number. At that time, we were only able to keep this number as is. After we lost this momentum,
>     it kept increasing back.
>   2. At least I scanned _all_ the previous JIRAs at least more than two times and resolved them. Roughly
>     once a year. The rest of them are mostly obsolete but not enough information to investigate further.
>   3. I strictly stick to "Contributing to JIRA Maintenance" https://spark.apache.org/contributing.html and
>     resolve JIRAs.
>   4. Promoting other people to comment on JIRA or actively resolve them.
>
> One of the facts I realised is the increasing number of committers doesn't virtually help this much (although
> it might be helpful if somebody active in JIRA becomes a committer.)
>
> One of the important thing I should note is that, it's now almost pretty difficult to reproduce and test the
> issues found in EOL releases. We should git clone, checkout, build and test. And then, see if that issue
> still exists in upstream, and fix. This is non-trivial overhead.
>
> Therefore, I would like to propose resolving _all_ the JIRAs that targets EOL releases - 2.2 and below.
> Please let me know if anyone has some concerns or objections.
>
> Thanks.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Resolving all JIRAs affecting EOL releases

Sean Owen-2
Agree, anything without an Affected Version should be old enough to time out.
I might use "Incomplete" or something as the status, as we haven't otherwise used that. Maybe that's simpler than a label. But, anything like that sounds good.

On Wed, May 15, 2019 at 8:40 PM Hyukjin Kwon <[hidden email]> wrote:
BTW, affected version became a required field (I don't remember when exactly was .. I believe it's around when we work on Spark 2.3):

Screen Shot 2019-05-16 at 10.29.50 AM.png

So, including all EOL versions and affected versions not specified will roughly work.
Using "Cannot Reproduce" as its status and 'bulk-closed' label makes the best sense to me.

Okie. I want to open this roughly for a week before taking an actual action for this. If there's no more feedback, I will do as I said ^ next week.


2019년 5월 15일 (수) 오후 11:33, Josh Rosen <[hidden email]>님이 작성:
+1 in favor of some sort of JIRA cleanup. 

My only request is that we attach some sort of 'bulk-closed' label to issues that we close via JIRA filter batch operations (and resolve the issues as "Timed Out" / "Cannot Reproduce", not "Fixed"). Using a label makes it easier to audit what was closed, simplifying the process of identifying and re-opening valid issues caught in our dragnet.


On Wed, May 15, 2019 at 7:19 AM Sean Owen <[hidden email]> wrote:
I gave up looking through JIRAs a long time ago, so, big respect for
continuing to try to triage them. I am afraid we're missing a few
important bug reports in the torrent, but most JIRAs are not
well-formed, just questions, stale, or simply things that won't be
added. I do think it's important to reflect that reality, and so I'm
always in favor of more aggressively closing JIRAs. I think this is
more standard practice, from projects like TensorFlow/Keras, pandas,
etc to just automatically drop Issues that don't see activity for N
days. We won't do that, but, are probably on the other hand far too
lax in closing them.

Remember that JIRAs stay searchable and can be reopened, so it's not
like we lose much information.

I'd close anything that hasn't had activity in 2 years (?), as a start.
I like the idea of closing things that only affect an EOL release,
but, many items aren't marked, so may need to cast the net wider.

I think only then does it make sense to look at bothering to reproduce
or evaluate the 1000s that will still remain.

On Wed, May 15, 2019 at 4:25 AM Hyukjin Kwon <[hidden email]> wrote:
>
> Hi all,
>
> I would like to propose to resolve all JIRAs that affects EOL releases - 2.2 and below. and affected version
> not specified. I was rather against this way and considered this as last resort in roughly 3 years ago
> when we discussed. Now I think we should go ahead with this. See below.
>
> I have been talking care of this for so long time almost every day those 3 years. The number of JIRAs
> keeps increasing and it does never go down. Now the number is going over 2500 JIRAs.
> Did you guys know? in JIRA, we can only go through page by page up to 1000 items. So, currently we're even
> having difficulties to go through every JIRA. We should manually filter out and check each.
> The number is going over the manageable size.
>
> I am not suggesting this without anything actually trying. This is what we have tried within my visibility:
>
>   1. In roughly 3 years ago, Sean tried to gather committers and even non-committers people to sort
>     out this number. At that time, we were only able to keep this number as is. After we lost this momentum,
>     it kept increasing back.
>   2. At least I scanned _all_ the previous JIRAs at least more than two times and resolved them. Roughly
>     once a year. The rest of them are mostly obsolete but not enough information to investigate further.
>   3. I strictly stick to "Contributing to JIRA Maintenance" https://spark.apache.org/contributing.html and
>     resolve JIRAs.
>   4. Promoting other people to comment on JIRA or actively resolve them.
>
> One of the facts I realised is the increasing number of committers doesn't virtually help this much (although
> it might be helpful if somebody active in JIRA becomes a committer.)
>
> One of the important thing I should note is that, it's now almost pretty difficult to reproduce and test the
> issues found in EOL releases. We should git clone, checkout, build and test. And then, see if that issue
> still exists in upstream, and fix. This is non-trivial overhead.
>
> Therefore, I would like to propose resolving _all_ the JIRAs that targets EOL releases - 2.2 and below.
> Please let me know if anyone has some concerns or objections.
>
> Thanks.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Resolving all JIRAs affecting EOL releases

Hyukjin Kwon
I actually recently used 'Incomplete'  a bit when the JIRA is basically too poorly formed (like just copying and pasting an error) ...

I was thinking about 'Unresolved' status or `Auto Closed' too. I double checked they can be reopen as well after resolution.

Screen Shot 2019-05-16 at 10.35.14 AM.png
Screen Shot 2019-05-16 at 10.35.39 AM.png

2019년 5월 16일 (목) 오전 11:04, Sean Owen <[hidden email]>님이 작성:
Agree, anything without an Affected Version should be old enough to time out.
I might use "Incomplete" or something as the status, as we haven't otherwise used that. Maybe that's simpler than a label. But, anything like that sounds good.

On Wed, May 15, 2019 at 8:40 PM Hyukjin Kwon <[hidden email]> wrote:
BTW, affected version became a required field (I don't remember when exactly was .. I believe it's around when we work on Spark 2.3):

Screen Shot 2019-05-16 at 10.29.50 AM.png

So, including all EOL versions and affected versions not specified will roughly work.
Using "Cannot Reproduce" as its status and 'bulk-closed' label makes the best sense to me.

Okie. I want to open this roughly for a week before taking an actual action for this. If there's no more feedback, I will do as I said ^ next week.


2019년 5월 15일 (수) 오후 11:33, Josh Rosen <[hidden email]>님이 작성:
+1 in favor of some sort of JIRA cleanup. 

My only request is that we attach some sort of 'bulk-closed' label to issues that we close via JIRA filter batch operations (and resolve the issues as "Timed Out" / "Cannot Reproduce", not "Fixed"). Using a label makes it easier to audit what was closed, simplifying the process of identifying and re-opening valid issues caught in our dragnet.


On Wed, May 15, 2019 at 7:19 AM Sean Owen <[hidden email]> wrote:
I gave up looking through JIRAs a long time ago, so, big respect for
continuing to try to triage them. I am afraid we're missing a few
important bug reports in the torrent, but most JIRAs are not
well-formed, just questions, stale, or simply things that won't be
added. I do think it's important to reflect that reality, and so I'm
always in favor of more aggressively closing JIRAs. I think this is
more standard practice, from projects like TensorFlow/Keras, pandas,
etc to just automatically drop Issues that don't see activity for N
days. We won't do that, but, are probably on the other hand far too
lax in closing them.

Remember that JIRAs stay searchable and can be reopened, so it's not
like we lose much information.

I'd close anything that hasn't had activity in 2 years (?), as a start.
I like the idea of closing things that only affect an EOL release,
but, many items aren't marked, so may need to cast the net wider.

I think only then does it make sense to look at bothering to reproduce
or evaluate the 1000s that will still remain.

On Wed, May 15, 2019 at 4:25 AM Hyukjin Kwon <[hidden email]> wrote:
>
> Hi all,
>
> I would like to propose to resolve all JIRAs that affects EOL releases - 2.2 and below. and affected version
> not specified. I was rather against this way and considered this as last resort in roughly 3 years ago
> when we discussed. Now I think we should go ahead with this. See below.
>
> I have been talking care of this for so long time almost every day those 3 years. The number of JIRAs
> keeps increasing and it does never go down. Now the number is going over 2500 JIRAs.
> Did you guys know? in JIRA, we can only go through page by page up to 1000 items. So, currently we're even
> having difficulties to go through every JIRA. We should manually filter out and check each.
> The number is going over the manageable size.
>
> I am not suggesting this without anything actually trying. This is what we have tried within my visibility:
>
>   1. In roughly 3 years ago, Sean tried to gather committers and even non-committers people to sort
>     out this number. At that time, we were only able to keep this number as is. After we lost this momentum,
>     it kept increasing back.
>   2. At least I scanned _all_ the previous JIRAs at least more than two times and resolved them. Roughly
>     once a year. The rest of them are mostly obsolete but not enough information to investigate further.
>   3. I strictly stick to "Contributing to JIRA Maintenance" https://spark.apache.org/contributing.html and
>     resolve JIRAs.
>   4. Promoting other people to comment on JIRA or actively resolve them.
>
> One of the facts I realised is the increasing number of committers doesn't virtually help this much (although
> it might be helpful if somebody active in JIRA becomes a committer.)
>
> One of the important thing I should note is that, it's now almost pretty difficult to reproduce and test the
> issues found in EOL releases. We should git clone, checkout, build and test. And then, see if that issue
> still exists in upstream, and fix. This is non-trivial overhead.
>
> Therefore, I would like to propose resolving _all_ the JIRAs that targets EOL releases - 2.2 and below.
> Please let me know if anyone has some concerns or objections.
>
> Thanks.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Resolving all JIRAs affecting EOL releases

Hyukjin Kwon
oh, wait. 'Incomplete' can still make sense in this way then.
Yes, I am good with 'Incomplete' too.

2019년 5월 16일 (목) 오전 11:24, Hyukjin Kwon <[hidden email]>님이 작성:
I actually recently used 'Incomplete'  a bit when the JIRA is basically too poorly formed (like just copying and pasting an error) ...

I was thinking about 'Unresolved' status or `Auto Closed' too. I double checked they can be reopen as well after resolution.

Screen Shot 2019-05-16 at 10.35.14 AM.png
Screen Shot 2019-05-16 at 10.35.39 AM.png

2019년 5월 16일 (목) 오전 11:04, Sean Owen <[hidden email]>님이 작성:
Agree, anything without an Affected Version should be old enough to time out.
I might use "Incomplete" or something as the status, as we haven't otherwise used that. Maybe that's simpler than a label. But, anything like that sounds good.

On Wed, May 15, 2019 at 8:40 PM Hyukjin Kwon <[hidden email]> wrote:
BTW, affected version became a required field (I don't remember when exactly was .. I believe it's around when we work on Spark 2.3):

Screen Shot 2019-05-16 at 10.29.50 AM.png

So, including all EOL versions and affected versions not specified will roughly work.
Using "Cannot Reproduce" as its status and 'bulk-closed' label makes the best sense to me.

Okie. I want to open this roughly for a week before taking an actual action for this. If there's no more feedback, I will do as I said ^ next week.


2019년 5월 15일 (수) 오후 11:33, Josh Rosen <[hidden email]>님이 작성:
+1 in favor of some sort of JIRA cleanup. 

My only request is that we attach some sort of 'bulk-closed' label to issues that we close via JIRA filter batch operations (and resolve the issues as "Timed Out" / "Cannot Reproduce", not "Fixed"). Using a label makes it easier to audit what was closed, simplifying the process of identifying and re-opening valid issues caught in our dragnet.


On Wed, May 15, 2019 at 7:19 AM Sean Owen <[hidden email]> wrote:
I gave up looking through JIRAs a long time ago, so, big respect for
continuing to try to triage them. I am afraid we're missing a few
important bug reports in the torrent, but most JIRAs are not
well-formed, just questions, stale, or simply things that won't be
added. I do think it's important to reflect that reality, and so I'm
always in favor of more aggressively closing JIRAs. I think this is
more standard practice, from projects like TensorFlow/Keras, pandas,
etc to just automatically drop Issues that don't see activity for N
days. We won't do that, but, are probably on the other hand far too
lax in closing them.

Remember that JIRAs stay searchable and can be reopened, so it's not
like we lose much information.

I'd close anything that hasn't had activity in 2 years (?), as a start.
I like the idea of closing things that only affect an EOL release,
but, many items aren't marked, so may need to cast the net wider.

I think only then does it make sense to look at bothering to reproduce
or evaluate the 1000s that will still remain.

On Wed, May 15, 2019 at 4:25 AM Hyukjin Kwon <[hidden email]> wrote:
>
> Hi all,
>
> I would like to propose to resolve all JIRAs that affects EOL releases - 2.2 and below. and affected version
> not specified. I was rather against this way and considered this as last resort in roughly 3 years ago
> when we discussed. Now I think we should go ahead with this. See below.
>
> I have been talking care of this for so long time almost every day those 3 years. The number of JIRAs
> keeps increasing and it does never go down. Now the number is going over 2500 JIRAs.
> Did you guys know? in JIRA, we can only go through page by page up to 1000 items. So, currently we're even
> having difficulties to go through every JIRA. We should manually filter out and check each.
> The number is going over the manageable size.
>
> I am not suggesting this without anything actually trying. This is what we have tried within my visibility:
>
>   1. In roughly 3 years ago, Sean tried to gather committers and even non-committers people to sort
>     out this number. At that time, we were only able to keep this number as is. After we lost this momentum,
>     it kept increasing back.
>   2. At least I scanned _all_ the previous JIRAs at least more than two times and resolved them. Roughly
>     once a year. The rest of them are mostly obsolete but not enough information to investigate further.
>   3. I strictly stick to "Contributing to JIRA Maintenance" https://spark.apache.org/contributing.html and
>     resolve JIRAs.
>   4. Promoting other people to comment on JIRA or actively resolve them.
>
> One of the facts I realised is the increasing number of committers doesn't virtually help this much (although
> it might be helpful if somebody active in JIRA becomes a committer.)
>
> One of the important thing I should note is that, it's now almost pretty difficult to reproduce and test the
> issues found in EOL releases. We should git clone, checkout, build and test. And then, see if that issue
> still exists in upstream, and fix. This is non-trivial overhead.
>
> Therefore, I would like to propose resolving _all_ the JIRAs that targets EOL releases - 2.2 and below.
> Please let me know if anyone has some concerns or objections.
>
> Thanks.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Resolving all JIRAs affecting EOL releases

Imran Rashid-4
+1, thanks for taking this on

On Wed, May 15, 2019 at 7:26 PM Hyukjin Kwon <[hidden email]> wrote:
oh, wait. 'Incomplete' can still make sense in this way then.
Yes, I am good with 'Incomplete' too.

2019년 5월 16일 (목) 오전 11:24, Hyukjin Kwon <[hidden email]>님이 작성:
I actually recently used 'Incomplete'  a bit when the JIRA is basically too poorly formed (like just copying and pasting an error) ...

I was thinking about 'Unresolved' status or `Auto Closed' too. I double checked they can be reopen as well after resolution.

Screen Shot 2019-05-16 at 10.35.14 AM.png
Screen Shot 2019-05-16 at 10.35.39 AM.png

2019년 5월 16일 (목) 오전 11:04, Sean Owen <[hidden email]>님이 작성:
Agree, anything without an Affected Version should be old enough to time out.
I might use "Incomplete" or something as the status, as we haven't otherwise used that. Maybe that's simpler than a label. But, anything like that sounds good.

On Wed, May 15, 2019 at 8:40 PM Hyukjin Kwon <[hidden email]> wrote:
BTW, affected version became a required field (I don't remember when exactly was .. I believe it's around when we work on Spark 2.3):

Screen Shot 2019-05-16 at 10.29.50 AM.png

So, including all EOL versions and affected versions not specified will roughly work.
Using "Cannot Reproduce" as its status and 'bulk-closed' label makes the best sense to me.

Okie. I want to open this roughly for a week before taking an actual action for this. If there's no more feedback, I will do as I said ^ next week.


2019년 5월 15일 (수) 오후 11:33, Josh Rosen <[hidden email]>님이 작성:
+1 in favor of some sort of JIRA cleanup. 

My only request is that we attach some sort of 'bulk-closed' label to issues that we close via JIRA filter batch operations (and resolve the issues as "Timed Out" / "Cannot Reproduce", not "Fixed"). Using a label makes it easier to audit what was closed, simplifying the process of identifying and re-opening valid issues caught in our dragnet.


On Wed, May 15, 2019 at 7:19 AM Sean Owen <[hidden email]> wrote:
I gave up looking through JIRAs a long time ago, so, big respect for
continuing to try to triage them. I am afraid we're missing a few
important bug reports in the torrent, but most JIRAs are not
well-formed, just questions, stale, or simply things that won't be
added. I do think it's important to reflect that reality, and so I'm
always in favor of more aggressively closing JIRAs. I think this is
more standard practice, from projects like TensorFlow/Keras, pandas,
etc to just automatically drop Issues that don't see activity for N
days. We won't do that, but, are probably on the other hand far too
lax in closing them.

Remember that JIRAs stay searchable and can be reopened, so it's not
like we lose much information.

I'd close anything that hasn't had activity in 2 years (?), as a start.
I like the idea of closing things that only affect an EOL release,
but, many items aren't marked, so may need to cast the net wider.

I think only then does it make sense to look at bothering to reproduce
or evaluate the 1000s that will still remain.

On Wed, May 15, 2019 at 4:25 AM Hyukjin Kwon <[hidden email]> wrote:
>
> Hi all,
>
> I would like to propose to resolve all JIRAs that affects EOL releases - 2.2 and below. and affected version
> not specified. I was rather against this way and considered this as last resort in roughly 3 years ago
> when we discussed. Now I think we should go ahead with this. See below.
>
> I have been talking care of this for so long time almost every day those 3 years. The number of JIRAs
> keeps increasing and it does never go down. Now the number is going over 2500 JIRAs.
> Did you guys know? in JIRA, we can only go through page by page up to 1000 items. So, currently we're even
> having difficulties to go through every JIRA. We should manually filter out and check each.
> The number is going over the manageable size.
>
> I am not suggesting this without anything actually trying. This is what we have tried within my visibility:
>
>   1. In roughly 3 years ago, Sean tried to gather committers and even non-committers people to sort
>     out this number. At that time, we were only able to keep this number as is. After we lost this momentum,
>     it kept increasing back.
>   2. At least I scanned _all_ the previous JIRAs at least more than two times and resolved them. Roughly
>     once a year. The rest of them are mostly obsolete but not enough information to investigate further.
>   3. I strictly stick to "Contributing to JIRA Maintenance" https://spark.apache.org/contributing.html and
>     resolve JIRAs.
>   4. Promoting other people to comment on JIRA or actively resolve them.
>
> One of the facts I realised is the increasing number of committers doesn't virtually help this much (although
> it might be helpful if somebody active in JIRA becomes a committer.)
>
> One of the important thing I should note is that, it's now almost pretty difficult to reproduce and test the
> issues found in EOL releases. We should git clone, checkout, build and test. And then, see if that issue
> still exists in upstream, and fix. This is non-trivial overhead.
>
> Therefore, I would like to propose resolving _all_ the JIRAs that targets EOL releases - 2.2 and below.
> Please let me know if anyone has some concerns or objections.
>
> Thanks.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Resolving all JIRAs affecting EOL releases

Dongjoon Hyun-2
+1, too.

Thank you, Hyukjin!

Bests,
Dongjoon.


On Fri, May 17, 2019 at 9:07 AM Imran Rashid <[hidden email]> wrote:
+1, thanks for taking this on

On Wed, May 15, 2019 at 7:26 PM Hyukjin Kwon <[hidden email]> wrote:
oh, wait. 'Incomplete' can still make sense in this way then.
Yes, I am good with 'Incomplete' too.

2019년 5월 16일 (목) 오전 11:24, Hyukjin Kwon <[hidden email]>님이 작성:
I actually recently used 'Incomplete'  a bit when the JIRA is basically too poorly formed (like just copying and pasting an error) ...

I was thinking about 'Unresolved' status or `Auto Closed' too. I double checked they can be reopen as well after resolution.

Screen Shot 2019-05-16 at 10.35.14 AM.png
Screen Shot 2019-05-16 at 10.35.39 AM.png

2019년 5월 16일 (목) 오전 11:04, Sean Owen <[hidden email]>님이 작성:
Agree, anything without an Affected Version should be old enough to time out.
I might use "Incomplete" or something as the status, as we haven't otherwise used that. Maybe that's simpler than a label. But, anything like that sounds good.

On Wed, May 15, 2019 at 8:40 PM Hyukjin Kwon <[hidden email]> wrote:
BTW, affected version became a required field (I don't remember when exactly was .. I believe it's around when we work on Spark 2.3):

Screen Shot 2019-05-16 at 10.29.50 AM.png

So, including all EOL versions and affected versions not specified will roughly work.
Using "Cannot Reproduce" as its status and 'bulk-closed' label makes the best sense to me.

Okie. I want to open this roughly for a week before taking an actual action for this. If there's no more feedback, I will do as I said ^ next week.


2019년 5월 15일 (수) 오후 11:33, Josh Rosen <[hidden email]>님이 작성:
+1 in favor of some sort of JIRA cleanup. 

My only request is that we attach some sort of 'bulk-closed' label to issues that we close via JIRA filter batch operations (and resolve the issues as "Timed Out" / "Cannot Reproduce", not "Fixed"). Using a label makes it easier to audit what was closed, simplifying the process of identifying and re-opening valid issues caught in our dragnet.


On Wed, May 15, 2019 at 7:19 AM Sean Owen <[hidden email]> wrote:
I gave up looking through JIRAs a long time ago, so, big respect for
continuing to try to triage them. I am afraid we're missing a few
important bug reports in the torrent, but most JIRAs are not
well-formed, just questions, stale, or simply things that won't be
added. I do think it's important to reflect that reality, and so I'm
always in favor of more aggressively closing JIRAs. I think this is
more standard practice, from projects like TensorFlow/Keras, pandas,
etc to just automatically drop Issues that don't see activity for N
days. We won't do that, but, are probably on the other hand far too
lax in closing them.

Remember that JIRAs stay searchable and can be reopened, so it's not
like we lose much information.

I'd close anything that hasn't had activity in 2 years (?), as a start.
I like the idea of closing things that only affect an EOL release,
but, many items aren't marked, so may need to cast the net wider.

I think only then does it make sense to look at bothering to reproduce
or evaluate the 1000s that will still remain.

On Wed, May 15, 2019 at 4:25 AM Hyukjin Kwon <[hidden email]> wrote:
>
> Hi all,
>
> I would like to propose to resolve all JIRAs that affects EOL releases - 2.2 and below. and affected version
> not specified. I was rather against this way and considered this as last resort in roughly 3 years ago
> when we discussed. Now I think we should go ahead with this. See below.
>
> I have been talking care of this for so long time almost every day those 3 years. The number of JIRAs
> keeps increasing and it does never go down. Now the number is going over 2500 JIRAs.
> Did you guys know? in JIRA, we can only go through page by page up to 1000 items. So, currently we're even
> having difficulties to go through every JIRA. We should manually filter out and check each.
> The number is going over the manageable size.
>
> I am not suggesting this without anything actually trying. This is what we have tried within my visibility:
>
>   1. In roughly 3 years ago, Sean tried to gather committers and even non-committers people to sort
>     out this number. At that time, we were only able to keep this number as is. After we lost this momentum,
>     it kept increasing back.
>   2. At least I scanned _all_ the previous JIRAs at least more than two times and resolved them. Roughly
>     once a year. The rest of them are mostly obsolete but not enough information to investigate further.
>   3. I strictly stick to "Contributing to JIRA Maintenance" https://spark.apache.org/contributing.html and
>     resolve JIRAs.
>   4. Promoting other people to comment on JIRA or actively resolve them.
>
> One of the facts I realised is the increasing number of committers doesn't virtually help this much (although
> it might be helpful if somebody active in JIRA becomes a committer.)
>
> One of the important thing I should note is that, it's now almost pretty difficult to reproduce and test the
> issues found in EOL releases. We should git clone, checkout, build and test. And then, see if that issue
> still exists in upstream, and fix. This is non-trivial overhead.
>
> Therefore, I would like to propose resolving _all_ the JIRAs that targets EOL releases - 2.2 and below.
> Please let me know if anyone has some concerns or objections.
>
> Thanks.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Resolving all JIRAs affecting EOL releases

Hyukjin Kwon
Thanks guys.

This thread got more than 3 PMC votes without any objection. I slightly edited JQL from Abdeali's suggestion (thanks, Abdeali).


JQL:

project = SPARK
  AND status in (Open, "In Progress", Reopened)
  AND (
    affectedVersion = EMPTY OR
    NOT (affectedVersion in versionMatch("^3.*")
      OR affectedVersion in versionMatch("^2.4.*")
      OR affectedVersion in versionMatch("^2.3.*")
    )
  )


https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20%0A%20%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%0A%20%20AND%20(%0A%20%20%20%20affectedVersion%20%3D%20EMPTY%20OR%0A%20%20%20%20NOT%20(affectedVersion%20in%20versionMatch(%22%5E3.*%22)%0A%20%20%20%20%20%20OR%20affectedVersion%20in%20versionMatch(%22%5E2.4.*%22)%0A%20%20%20%20%20%20OR%20affectedVersion%20in%20versionMatch(%22%5E2.3.*%22)%0A%20%20%20%20)%0A%20%20)


It means we will resolve all JIRAs that have EOL releases as affected versions, including no version specified in affected versions - this will reduce open JIRAs under 900.

Looks I can use a bulk action feature in JIRA. Tomorrow at the similar time, I will
- Label those JIRAs as 'bulk-closed'
- Resolve them via `Incomplete` status.

Please double check the list and let me know if you guys have any concern.





2019년 5월 18일 (토) 오후 12:22, Dongjoon Hyun <[hidden email]>님이 작성:
+1, too.

Thank you, Hyukjin!

Bests,
Dongjoon.


On Fri, May 17, 2019 at 9:07 AM Imran Rashid <[hidden email]> wrote:
+1, thanks for taking this on

On Wed, May 15, 2019 at 7:26 PM Hyukjin Kwon <[hidden email]> wrote:
oh, wait. 'Incomplete' can still make sense in this way then.
Yes, I am good with 'Incomplete' too.

2019년 5월 16일 (목) 오전 11:24, Hyukjin Kwon <[hidden email]>님이 작성:
I actually recently used 'Incomplete'  a bit when the JIRA is basically too poorly formed (like just copying and pasting an error) ...

I was thinking about 'Unresolved' status or `Auto Closed' too. I double checked they can be reopen as well after resolution.

Screen Shot 2019-05-16 at 10.35.14 AM.png
Screen Shot 2019-05-16 at 10.35.39 AM.png

2019년 5월 16일 (목) 오전 11:04, Sean Owen <[hidden email]>님이 작성:
Agree, anything without an Affected Version should be old enough to time out.
I might use "Incomplete" or something as the status, as we haven't otherwise used that. Maybe that's simpler than a label. But, anything like that sounds good.

On Wed, May 15, 2019 at 8:40 PM Hyukjin Kwon <[hidden email]> wrote:
BTW, affected version became a required field (I don't remember when exactly was .. I believe it's around when we work on Spark 2.3):

Screen Shot 2019-05-16 at 10.29.50 AM.png

So, including all EOL versions and affected versions not specified will roughly work.
Using "Cannot Reproduce" as its status and 'bulk-closed' label makes the best sense to me.

Okie. I want to open this roughly for a week before taking an actual action for this. If there's no more feedback, I will do as I said ^ next week.


2019년 5월 15일 (수) 오후 11:33, Josh Rosen <[hidden email]>님이 작성:
+1 in favor of some sort of JIRA cleanup. 

My only request is that we attach some sort of 'bulk-closed' label to issues that we close via JIRA filter batch operations (and resolve the issues as "Timed Out" / "Cannot Reproduce", not "Fixed"). Using a label makes it easier to audit what was closed, simplifying the process of identifying and re-opening valid issues caught in our dragnet.


On Wed, May 15, 2019 at 7:19 AM Sean Owen <[hidden email]> wrote:
I gave up looking through JIRAs a long time ago, so, big respect for
continuing to try to triage them. I am afraid we're missing a few
important bug reports in the torrent, but most JIRAs are not
well-formed, just questions, stale, or simply things that won't be
added. I do think it's important to reflect that reality, and so I'm
always in favor of more aggressively closing JIRAs. I think this is
more standard practice, from projects like TensorFlow/Keras, pandas,
etc to just automatically drop Issues that don't see activity for N
days. We won't do that, but, are probably on the other hand far too
lax in closing them.

Remember that JIRAs stay searchable and can be reopened, so it's not
like we lose much information.

I'd close anything that hasn't had activity in 2 years (?), as a start.
I like the idea of closing things that only affect an EOL release,
but, many items aren't marked, so may need to cast the net wider.

I think only then does it make sense to look at bothering to reproduce
or evaluate the 1000s that will still remain.

On Wed, May 15, 2019 at 4:25 AM Hyukjin Kwon <[hidden email]> wrote:
>
> Hi all,
>
> I would like to propose to resolve all JIRAs that affects EOL releases - 2.2 and below. and affected version
> not specified. I was rather against this way and considered this as last resort in roughly 3 years ago
> when we discussed. Now I think we should go ahead with this. See below.
>
> I have been talking care of this for so long time almost every day those 3 years. The number of JIRAs
> keeps increasing and it does never go down. Now the number is going over 2500 JIRAs.
> Did you guys know? in JIRA, we can only go through page by page up to 1000 items. So, currently we're even
> having difficulties to go through every JIRA. We should manually filter out and check each.
> The number is going over the manageable size.
>
> I am not suggesting this without anything actually trying. This is what we have tried within my visibility:
>
>   1. In roughly 3 years ago, Sean tried to gather committers and even non-committers people to sort
>     out this number. At that time, we were only able to keep this number as is. After we lost this momentum,
>     it kept increasing back.
>   2. At least I scanned _all_ the previous JIRAs at least more than two times and resolved them. Roughly
>     once a year. The rest of them are mostly obsolete but not enough information to investigate further.
>   3. I strictly stick to "Contributing to JIRA Maintenance" https://spark.apache.org/contributing.html and
>     resolve JIRAs.
>   4. Promoting other people to comment on JIRA or actively resolve them.
>
> One of the facts I realised is the increasing number of committers doesn't virtually help this much (although
> it might be helpful if somebody active in JIRA becomes a committer.)
>
> One of the important thing I should note is that, it's now almost pretty difficult to reproduce and test the
> issues found in EOL releases. We should git clone, checkout, build and test. And then, see if that issue
> still exists in upstream, and fix. This is non-trivial overhead.
>
> Therefore, I would like to propose resolving _all_ the JIRAs that targets EOL releases - 2.2 and below.
> Please let me know if anyone has some concerns or objections.
>
> Thanks.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Resolving all JIRAs affecting EOL releases

Sean Owen-2
I'd only tweak this to perhaps not close JIRAs that have been updated recently -- even just avoiding things updated in the last month. For example this would close https://issues.apache.org/jira/browse/SPARK-27758 which was opened Friday (though, for other reasons it should probably be closed). Still I don't mind it under the logic that it has been reported against 2.1.0.

On the other hand, I'd go further and close _anything_ not updated in a long time, like a year (or 2 if feeling conservative). That is there's probably a lot of old cruft out there that wasn't marked with an Affected Version, before that was required.

On Sat, May 18, 2019 at 10:48 PM Hyukjin Kwon <[hidden email]> wrote:
Thanks guys.

This thread got more than 3 PMC votes without any objection. I slightly edited JQL from Abdeali's suggestion (thanks, Abdeali).


JQL:

project = SPARK
  AND status in (Open, "In Progress", Reopened)
  AND (
    affectedVersion = EMPTY OR
    NOT (affectedVersion in versionMatch("^3.*")
      OR affectedVersion in versionMatch("^2.4.*")
      OR affectedVersion in versionMatch("^2.3.*")
    )
  )


https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20%0A%20%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%0A%20%20AND%20(%0A%20%20%20%20affectedVersion%20%3D%20EMPTY%20OR%0A%20%20%20%20NOT%20(affectedVersion%20in%20versionMatch(%22%5E3.*%22)%0A%20%20%20%20%20%20OR%20affectedVersion%20in%20versionMatch(%22%5E2.4.*%22)%0A%20%20%20%20%20%20OR%20affectedVersion%20in%20versionMatch(%22%5E2.3.*%22)%0A%20%20%20%20)%0A%20%20)


It means we will resolve all JIRAs that have EOL releases as affected versions, including no version specified in affected versions - this will reduce open JIRAs under 900.

Looks I can use a bulk action feature in JIRA. Tomorrow at the similar time, I will
- Label those JIRAs as 'bulk-closed'
- Resolve them via `Incomplete` status.

Please double check the list and let me know if you guys have any concern.





2019년 5월 18일 (토) 오후 12:22, Dongjoon Hyun <[hidden email]>님이 작성:
+1, too.

Thank you, Hyukjin!

Bests,
Dongjoon.


On Fri, May 17, 2019 at 9:07 AM Imran Rashid <[hidden email]> wrote:
+1, thanks for taking this on

On Wed, May 15, 2019 at 7:26 PM Hyukjin Kwon <[hidden email]> wrote:
oh, wait. 'Incomplete' can still make sense in this way then.
Yes, I am good with 'Incomplete' too.

2019년 5월 16일 (목) 오전 11:24, Hyukjin Kwon <[hidden email]>님이 작성:
I actually recently used 'Incomplete'  a bit when the JIRA is basically too poorly formed (like just copying and pasting an error) ...

I was thinking about 'Unresolved' status or `Auto Closed' too. I double checked they can be reopen as well after resolution.

Screen Shot 2019-05-16 at 10.35.14 AM.png
Screen Shot 2019-05-16 at 10.35.39 AM.png

2019년 5월 16일 (목) 오전 11:04, Sean Owen <[hidden email]>님이 작성:
Agree, anything without an Affected Version should be old enough to time out.
I might use "Incomplete" or something as the status, as we haven't otherwise used that. Maybe that's simpler than a label. But, anything like that sounds good.

On Wed, May 15, 2019 at 8:40 PM Hyukjin Kwon <[hidden email]> wrote:
BTW, affected version became a required field (I don't remember when exactly was .. I believe it's around when we work on Spark 2.3):

Screen Shot 2019-05-16 at 10.29.50 AM.png

So, including all EOL versions and affected versions not specified will roughly work.
Using "Cannot Reproduce" as its status and 'bulk-closed' label makes the best sense to me.

Okie. I want to open this roughly for a week before taking an actual action for this. If there's no more feedback, I will do as I said ^ next week.


2019년 5월 15일 (수) 오후 11:33, Josh Rosen <[hidden email]>님이 작성:
+1 in favor of some sort of JIRA cleanup. 

My only request is that we attach some sort of 'bulk-closed' label to issues that we close via JIRA filter batch operations (and resolve the issues as "Timed Out" / "Cannot Reproduce", not "Fixed"). Using a label makes it easier to audit what was closed, simplifying the process of identifying and re-opening valid issues caught in our dragnet.


On Wed, May 15, 2019 at 7:19 AM Sean Owen <[hidden email]> wrote:
I gave up looking through JIRAs a long time ago, so, big respect for
continuing to try to triage them. I am afraid we're missing a few
important bug reports in the torrent, but most JIRAs are not
well-formed, just questions, stale, or simply things that won't be
added. I do think it's important to reflect that reality, and so I'm
always in favor of more aggressively closing JIRAs. I think this is
more standard practice, from projects like TensorFlow/Keras, pandas,
etc to just automatically drop Issues that don't see activity for N
days. We won't do that, but, are probably on the other hand far too
lax in closing them.

Remember that JIRAs stay searchable and can be reopened, so it's not
like we lose much information.

I'd close anything that hasn't had activity in 2 years (?), as a start.
I like the idea of closing things that only affect an EOL release,
but, many items aren't marked, so may need to cast the net wider.

I think only then does it make sense to look at bothering to reproduce
or evaluate the 1000s that will still remain.

On Wed, May 15, 2019 at 4:25 AM Hyukjin Kwon <[hidden email]> wrote:
>
> Hi all,
>
> I would like to propose to resolve all JIRAs that affects EOL releases - 2.2 and below. and affected version
> not specified. I was rather against this way and considered this as last resort in roughly 3 years ago
> when we discussed. Now I think we should go ahead with this. See below.
>
> I have been talking care of this for so long time almost every day those 3 years. The number of JIRAs
> keeps increasing and it does never go down. Now the number is going over 2500 JIRAs.
> Did you guys know? in JIRA, we can only go through page by page up to 1000 items. So, currently we're even
> having difficulties to go through every JIRA. We should manually filter out and check each.
> The number is going over the manageable size.
>
> I am not suggesting this without anything actually trying. This is what we have tried within my visibility:
>
>   1. In roughly 3 years ago, Sean tried to gather committers and even non-committers people to sort
>     out this number. At that time, we were only able to keep this number as is. After we lost this momentum,
>     it kept increasing back.
>   2. At least I scanned _all_ the previous JIRAs at least more than two times and resolved them. Roughly
>     once a year. The rest of them are mostly obsolete but not enough information to investigate further.
>   3. I strictly stick to "Contributing to JIRA Maintenance" https://spark.apache.org/contributing.html and
>     resolve JIRAs.
>   4. Promoting other people to comment on JIRA or actively resolve them.
>
> One of the facts I realised is the increasing number of committers doesn't virtually help this much (although
> it might be helpful if somebody active in JIRA becomes a committer.)
>
> One of the important thing I should note is that, it's now almost pretty difficult to reproduce and test the
> issues found in EOL releases. We should git clone, checkout, build and test. And then, see if that issue
> still exists in upstream, and fix. This is non-trivial overhead.
>
> Therefore, I would like to propose resolving _all_ the JIRAs that targets EOL releases - 2.2 and below.
> Please let me know if anyone has some concerns or objections.
>
> Thanks.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Resolving all JIRAs affecting EOL releases

Hyukjin Kwon
I will add one more condition for "updated". So, it will additionally avoid things updated within one year but left open against EOL releases.

project = SPARK
  AND status in (Open, "In Progress", Reopened)
  AND (
    affectedVersion = EMPTY OR
    NOT (affectedVersion in versionMatch("^3.*")
      OR affectedVersion in versionMatch("^2.4.*")
      OR affectedVersion in versionMatch("^2.3.*")
    )
  )
  AND updated <= -52w


https://issues.apache.org/jira/issues/?filter=12344168&jql=project%20%3D%20SPARK%20%0A%20%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%0A%20%20AND%20(%0A%20%20%20%20affectedVersion%20%3D%20EMPTY%20OR%0A%20%20%20%20NOT%20(affectedVersion%20in%20versionMatch(%22%5E3.*%22)%0A%20%20%20%20%20%20OR%20affectedVersion%20in%20versionMatch(%22%5E2.4.*%22)%0A%20%20%20%20%20%20OR%20affectedVersion%20in%20versionMatch(%22%5E2.3.*%22)%0A%20%20%20%20)%0A%20%20)%0A%20%20AND%20updated%20%3C%3D%20-52w

This still reduces JIRAs under 1000 which I originally targeted.



2019년 5월 19일 (일) 오후 6:08, Sean Owen <[hidden email]>님이 작성:
I'd only tweak this to perhaps not close JIRAs that have been updated recently -- even just avoiding things updated in the last month. For example this would close https://issues.apache.org/jira/browse/SPARK-27758 which was opened Friday (though, for other reasons it should probably be closed). Still I don't mind it under the logic that it has been reported against 2.1.0.

On the other hand, I'd go further and close _anything_ not updated in a long time, like a year (or 2 if feeling conservative). That is there's probably a lot of old cruft out there that wasn't marked with an Affected Version, before that was required.

On Sat, May 18, 2019 at 10:48 PM Hyukjin Kwon <[hidden email]> wrote:
Thanks guys.

This thread got more than 3 PMC votes without any objection. I slightly edited JQL from Abdeali's suggestion (thanks, Abdeali).


JQL:

project = SPARK
  AND status in (Open, "In Progress", Reopened)
  AND (
    affectedVersion = EMPTY OR
    NOT (affectedVersion in versionMatch("^3.*")
      OR affectedVersion in versionMatch("^2.4.*")
      OR affectedVersion in versionMatch("^2.3.*")
    )
  )


https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20%0A%20%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%0A%20%20AND%20(%0A%20%20%20%20affectedVersion%20%3D%20EMPTY%20OR%0A%20%20%20%20NOT%20(affectedVersion%20in%20versionMatch(%22%5E3.*%22)%0A%20%20%20%20%20%20OR%20affectedVersion%20in%20versionMatch(%22%5E2.4.*%22)%0A%20%20%20%20%20%20OR%20affectedVersion%20in%20versionMatch(%22%5E2.3.*%22)%0A%20%20%20%20)%0A%20%20)


It means we will resolve all JIRAs that have EOL releases as affected versions, including no version specified in affected versions - this will reduce open JIRAs under 900.

Looks I can use a bulk action feature in JIRA. Tomorrow at the similar time, I will
- Label those JIRAs as 'bulk-closed'
- Resolve them via `Incomplete` status.

Please double check the list and let me know if you guys have any concern.





2019년 5월 18일 (토) 오후 12:22, Dongjoon Hyun <[hidden email]>님이 작성:
+1, too.

Thank you, Hyukjin!

Bests,
Dongjoon.


On Fri, May 17, 2019 at 9:07 AM Imran Rashid <[hidden email]> wrote:
+1, thanks for taking this on

On Wed, May 15, 2019 at 7:26 PM Hyukjin Kwon <[hidden email]> wrote:
oh, wait. 'Incomplete' can still make sense in this way then.
Yes, I am good with 'Incomplete' too.

2019년 5월 16일 (목) 오전 11:24, Hyukjin Kwon <[hidden email]>님이 작성:
I actually recently used 'Incomplete'  a bit when the JIRA is basically too poorly formed (like just copying and pasting an error) ...

I was thinking about 'Unresolved' status or `Auto Closed' too. I double checked they can be reopen as well after resolution.

Screen Shot 2019-05-16 at 10.35.14 AM.png
Screen Shot 2019-05-16 at 10.35.39 AM.png

2019년 5월 16일 (목) 오전 11:04, Sean Owen <[hidden email]>님이 작성:
Agree, anything without an Affected Version should be old enough to time out.
I might use "Incomplete" or something as the status, as we haven't otherwise used that. Maybe that's simpler than a label. But, anything like that sounds good.

On Wed, May 15, 2019 at 8:40 PM Hyukjin Kwon <[hidden email]> wrote:
BTW, affected version became a required field (I don't remember when exactly was .. I believe it's around when we work on Spark 2.3):

Screen Shot 2019-05-16 at 10.29.50 AM.png

So, including all EOL versions and affected versions not specified will roughly work.
Using "Cannot Reproduce" as its status and 'bulk-closed' label makes the best sense to me.

Okie. I want to open this roughly for a week before taking an actual action for this. If there's no more feedback, I will do as I said ^ next week.


2019년 5월 15일 (수) 오후 11:33, Josh Rosen <[hidden email]>님이 작성:
+1 in favor of some sort of JIRA cleanup. 

My only request is that we attach some sort of 'bulk-closed' label to issues that we close via JIRA filter batch operations (and resolve the issues as "Timed Out" / "Cannot Reproduce", not "Fixed"). Using a label makes it easier to audit what was closed, simplifying the process of identifying and re-opening valid issues caught in our dragnet.


On Wed, May 15, 2019 at 7:19 AM Sean Owen <[hidden email]> wrote:
I gave up looking through JIRAs a long time ago, so, big respect for
continuing to try to triage them. I am afraid we're missing a few
important bug reports in the torrent, but most JIRAs are not
well-formed, just questions, stale, or simply things that won't be
added. I do think it's important to reflect that reality, and so I'm
always in favor of more aggressively closing JIRAs. I think this is
more standard practice, from projects like TensorFlow/Keras, pandas,
etc to just automatically drop Issues that don't see activity for N
days. We won't do that, but, are probably on the other hand far too
lax in closing them.

Remember that JIRAs stay searchable and can be reopened, so it's not
like we lose much information.

I'd close anything that hasn't had activity in 2 years (?), as a start.
I like the idea of closing things that only affect an EOL release,
but, many items aren't marked, so may need to cast the net wider.

I think only then does it make sense to look at bothering to reproduce
or evaluate the 1000s that will still remain.

On Wed, May 15, 2019 at 4:25 AM Hyukjin Kwon <[hidden email]> wrote:
>
> Hi all,
>
> I would like to propose to resolve all JIRAs that affects EOL releases - 2.2 and below. and affected version
> not specified. I was rather against this way and considered this as last resort in roughly 3 years ago
> when we discussed. Now I think we should go ahead with this. See below.
>
> I have been talking care of this for so long time almost every day those 3 years. The number of JIRAs
> keeps increasing and it does never go down. Now the number is going over 2500 JIRAs.
> Did you guys know? in JIRA, we can only go through page by page up to 1000 items. So, currently we're even
> having difficulties to go through every JIRA. We should manually filter out and check each.
> The number is going over the manageable size.
>
> I am not suggesting this without anything actually trying. This is what we have tried within my visibility:
>
>   1. In roughly 3 years ago, Sean tried to gather committers and even non-committers people to sort
>     out this number. At that time, we were only able to keep this number as is. After we lost this momentum,
>     it kept increasing back.
>   2. At least I scanned _all_ the previous JIRAs at least more than two times and resolved them. Roughly
>     once a year. The rest of them are mostly obsolete but not enough information to investigate further.
>   3. I strictly stick to "Contributing to JIRA Maintenance" https://spark.apache.org/contributing.html and
>     resolve JIRAs.
>   4. Promoting other people to comment on JIRA or actively resolve them.
>
> One of the facts I realised is the increasing number of committers doesn't virtually help this much (although
> it might be helpful if somebody active in JIRA becomes a committer.)
>
> One of the important thing I should note is that, it's now almost pretty difficult to reproduce and test the
> issues found in EOL releases. We should git clone, checkout, build and test. And then, see if that issue
> still exists in upstream, and fix. This is non-trivial overhead.
>
> Therefore, I would like to propose resolving _all_ the JIRAs that targets EOL releases - 2.2 and below.
> Please let me know if anyone has some concerns or objections.
>
> Thanks.

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]