Adding JIRA ID as the prefix for the test case name

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Adding JIRA ID as the prefix for the test case name

Hyukjin Kwon
Hi all,

Maybe it's not a big deal but it brought some confusions time to time into Spark dev and community. I think it's time to discuss about when/which format to add a JIRA ID as a prefix for the test case name in Scala test cases.

Currently we have many test case names with prefixes as below:
  • test("SPARK-XXXXX blah blah")
  • test("SPARK-XXXXX: blah blah")
  • test("SPARK-XXXXX - blah blah")
  • test("[SPARK-XXXXX] blah blah")
It is a good practice to have the JIRA ID in general because, for instance,
it makes us put less efforts to track commit histories (or even when the files
are totally moved), or to track related information of tests failed.
Considering Spark's getting big, I think it's good to document.

I would like to suggest this and document it in our guideline:

1. Add a prefix into a test name when a PR adds a couple of tests.
2. Uses "SPARK-XXXX: test name" format which is used in our code base most
      often[1].

We should make it simple and clear but closer to the actual practice. So, I would like to listen to what other people think. I would appreciate if you guys give some feedback about when to add the JIRA prefix. One alternative is that, we only add the prefix when the JIRA's type is bug.

[1]
git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
     923
git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
     477
git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
      16
git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
      13



Reply | Threaded
Open this post in threaded view
|

Re: Adding JIRA ID as the prefix for the test case name

Takeshi Yamamuro
+1 for having that consistent rule in test names.
This is a trivial problem though, I think documenting this rule in the contribution guide
might be able to make reviewer overhead a little smaller.

Bests,
Takeshi

On Tue, Nov 12, 2019 at 1:46 AM Hyukjin Kwon <[hidden email]> wrote:
Hi all,

Maybe it's not a big deal but it brought some confusions time to time into Spark dev and community. I think it's time to discuss about when/which format to add a JIRA ID as a prefix for the test case name in Scala test cases.

Currently we have many test case names with prefixes as below:
  • test("SPARK-XXXXX blah blah")
  • test("SPARK-XXXXX: blah blah")
  • test("SPARK-XXXXX - blah blah")
  • test("[SPARK-XXXXX] blah blah")
It is a good practice to have the JIRA ID in general because, for instance,
it makes us put less efforts to track commit histories (or even when the files
are totally moved), or to track related information of tests failed.
Considering Spark's getting big, I think it's good to document.

I would like to suggest this and document it in our guideline:

1. Add a prefix into a test name when a PR adds a couple of tests.
2. Uses "SPARK-XXXX: test name" format which is used in our code base most
      often[1].

We should make it simple and clear but closer to the actual practice. So, I would like to listen to what other people think. I would appreciate if you guys give some feedback about when to add the JIRA prefix. One alternative is that, we only add the prefix when the JIRA's type is bug.

[1]
git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
     923
git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
     477
git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
      16
git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
      13





--
---
Takeshi Yamamuro
Reply | Threaded
Open this post in threaded view
|

Re: Adding JIRA ID as the prefix for the test case name

Gengliang
+1 for making it a guideline. This is helpful when the test cases are moved to a different file.

On Mon, Nov 11, 2019 at 3:23 PM Takeshi Yamamuro <[hidden email]> wrote:
+1 for having that consistent rule in test names.
This is a trivial problem though, I think documenting this rule in the contribution guide
might be able to make reviewer overhead a little smaller.

Bests,
Takeshi

On Tue, Nov 12, 2019 at 1:46 AM Hyukjin Kwon <[hidden email]> wrote:
Hi all,

Maybe it's not a big deal but it brought some confusions time to time into Spark dev and community. I think it's time to discuss about when/which format to add a JIRA ID as a prefix for the test case name in Scala test cases.

Currently we have many test case names with prefixes as below:
  • test("SPARK-XXXXX blah blah")
  • test("SPARK-XXXXX: blah blah")
  • test("SPARK-XXXXX - blah blah")
  • test("[SPARK-XXXXX] blah blah")
It is a good practice to have the JIRA ID in general because, for instance,
it makes us put less efforts to track commit histories (or even when the files
are totally moved), or to track related information of tests failed.
Considering Spark's getting big, I think it's good to document.

I would like to suggest this and document it in our guideline:

1. Add a prefix into a test name when a PR adds a couple of tests.
2. Uses "SPARK-XXXX: test name" format which is used in our code base most
      often[1].

We should make it simple and clear but closer to the actual practice. So, I would like to listen to what other people think. I would appreciate if you guys give some feedback about when to add the JIRA prefix. One alternative is that, we only add the prefix when the JIRA's type is bug.

[1]
git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
     923
git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
     477
git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
      16
git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
      13





--
---
Takeshi Yamamuro
Reply | Threaded
Open this post in threaded view
|

Re: Adding JIRA ID as the prefix for the test case name

Hyukjin Kwon
In few days, I will wrote this in our guidelines probably after rewording it a bit better:

1. Add a prefix into a test name when a PR adds a couple of tests.
2. Uses "SPARK-XXXX: test name" format.

Please let me know if you have any different opinion about what/when to write the JIRA ID as the prefix.
I would like to make sure this simple rule is closer to the actual practice from you guys.


2019년 11월 12일 (화) 오전 8:41, Gengliang <[hidden email]>님이 작성:
+1 for making it a guideline. This is helpful when the test cases are moved to a different file.

On Mon, Nov 11, 2019 at 3:23 PM Takeshi Yamamuro <[hidden email]> wrote:
+1 for having that consistent rule in test names.
This is a trivial problem though, I think documenting this rule in the contribution guide
might be able to make reviewer overhead a little smaller.

Bests,
Takeshi

On Tue, Nov 12, 2019 at 1:46 AM Hyukjin Kwon <[hidden email]> wrote:
Hi all,

Maybe it's not a big deal but it brought some confusions time to time into Spark dev and community. I think it's time to discuss about when/which format to add a JIRA ID as a prefix for the test case name in Scala test cases.

Currently we have many test case names with prefixes as below:
  • test("SPARK-XXXXX blah blah")
  • test("SPARK-XXXXX: blah blah")
  • test("SPARK-XXXXX - blah blah")
  • test("[SPARK-XXXXX] blah blah")
It is a good practice to have the JIRA ID in general because, for instance,
it makes us put less efforts to track commit histories (or even when the files
are totally moved), or to track related information of tests failed.
Considering Spark's getting big, I think it's good to document.

I would like to suggest this and document it in our guideline:

1. Add a prefix into a test name when a PR adds a couple of tests.
2. Uses "SPARK-XXXX: test name" format which is used in our code base most
      often[1].

We should make it simple and clear but closer to the actual practice. So, I would like to listen to what other people think. I would appreciate if you guys give some feedback about when to add the JIRA prefix. One alternative is that, we only add the prefix when the JIRA's type is bug.

[1]
git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
     923
git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
     477
git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
      16
git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
      13





--
---
Takeshi Yamamuro
Reply | Threaded
Open this post in threaded view
|

Re: Adding JIRA ID as the prefix for the test case name

Gabor Somogyi
+1 for having that consistent rule in test names.
+1 for making it a guideline.
+1 defining exact guides in general.

Until now I've followed the alternative (only add the prefix when the JIRA's type is bug) and that way I knew that such tests contain edge cases.
In case of new features I'm pretty sure there is a reason to introduce it but at the moment can't imagine a use-case where it can help us (want to convert it to daily routine).

> This is helpful when the test cases are moved to a different file.
The test can be found by name without jira ID


On Tue, Nov 12, 2019 at 5:31 AM Hyukjin Kwon <[hidden email]> wrote:
In few days, I will wrote this in our guidelines probably after rewording it a bit better:

1. Add a prefix into a test name when a PR adds a couple of tests.
2. Uses "SPARK-XXXX: test name" format.

Please let me know if you have any different opinion about what/when to write the JIRA ID as the prefix.
I would like to make sure this simple rule is closer to the actual practice from you guys.


2019년 11월 12일 (화) 오전 8:41, Gengliang <[hidden email]>님이 작성:
+1 for making it a guideline. This is helpful when the test cases are moved to a different file.

On Mon, Nov 11, 2019 at 3:23 PM Takeshi Yamamuro <[hidden email]> wrote:
+1 for having that consistent rule in test names.
This is a trivial problem though, I think documenting this rule in the contribution guide
might be able to make reviewer overhead a little smaller.

Bests,
Takeshi

On Tue, Nov 12, 2019 at 1:46 AM Hyukjin Kwon <[hidden email]> wrote:
Hi all,

Maybe it's not a big deal but it brought some confusions time to time into Spark dev and community. I think it's time to discuss about when/which format to add a JIRA ID as a prefix for the test case name in Scala test cases.

Currently we have many test case names with prefixes as below:
  • test("SPARK-XXXXX blah blah")
  • test("SPARK-XXXXX: blah blah")
  • test("SPARK-XXXXX - blah blah")
  • test("[SPARK-XXXXX] blah blah")
It is a good practice to have the JIRA ID in general because, for instance,
it makes us put less efforts to track commit histories (or even when the files
are totally moved), or to track related information of tests failed.
Considering Spark's getting big, I think it's good to document.

I would like to suggest this and document it in our guideline:

1. Add a prefix into a test name when a PR adds a couple of tests.
2. Uses "SPARK-XXXX: test name" format which is used in our code base most
      often[1].

We should make it simple and clear but closer to the actual practice. So, I would like to listen to what other people think. I would appreciate if you guys give some feedback about when to add the JIRA prefix. One alternative is that, we only add the prefix when the JIRA's type is bug.

[1]
git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
     923
git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
     477
git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
      16
git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
      13





--
---
Takeshi Yamamuro
Reply | Threaded
Open this post in threaded view
|

Re: Adding JIRA ID as the prefix for the test case name

Dongjoon Hyun-2
Thank you for the suggestion, Hyukjin.

Previously, we added Jira IDs for the bug fix PR test cases as Gabor said.

For the new features (and improvements), we didn't add them

because all test cases in the newly added test suite share the same prefix JIRA ID in that case.

It might looks redundant.

However, I'm +1 for Hyukjin's original suggestion because we had better have the official rule for this in some ways.

Thank you again, Hyukjin.

Bests,
Dongjoon.



On Tue, Nov 12, 2019 at 1:13 AM Gabor Somogyi <[hidden email]> wrote:
+1 for having that consistent rule in test names.
+1 for making it a guideline.
+1 defining exact guides in general.

Until now I've followed the alternative (only add the prefix when the JIRA's type is bug) and that way I knew that such tests contain edge cases.
In case of new features I'm pretty sure there is a reason to introduce it but at the moment can't imagine a use-case where it can help us (want to convert it to daily routine).

> This is helpful when the test cases are moved to a different file.
The test can be found by name without jira ID


On Tue, Nov 12, 2019 at 5:31 AM Hyukjin Kwon <[hidden email]> wrote:
In few days, I will wrote this in our guidelines probably after rewording it a bit better:

1. Add a prefix into a test name when a PR adds a couple of tests.
2. Uses "SPARK-XXXX: test name" format.

Please let me know if you have any different opinion about what/when to write the JIRA ID as the prefix.
I would like to make sure this simple rule is closer to the actual practice from you guys.


2019년 11월 12일 (화) 오전 8:41, Gengliang <[hidden email]>님이 작성:
+1 for making it a guideline. This is helpful when the test cases are moved to a different file.

On Mon, Nov 11, 2019 at 3:23 PM Takeshi Yamamuro <[hidden email]> wrote:
+1 for having that consistent rule in test names.
This is a trivial problem though, I think documenting this rule in the contribution guide
might be able to make reviewer overhead a little smaller.

Bests,
Takeshi

On Tue, Nov 12, 2019 at 1:46 AM Hyukjin Kwon <[hidden email]> wrote:
Hi all,

Maybe it's not a big deal but it brought some confusions time to time into Spark dev and community. I think it's time to discuss about when/which format to add a JIRA ID as a prefix for the test case name in Scala test cases.

Currently we have many test case names with prefixes as below:
  • test("SPARK-XXXXX blah blah")
  • test("SPARK-XXXXX: blah blah")
  • test("SPARK-XXXXX - blah blah")
  • test("[SPARK-XXXXX] blah blah")
It is a good practice to have the JIRA ID in general because, for instance,
it makes us put less efforts to track commit histories (or even when the files
are totally moved), or to track related information of tests failed.
Considering Spark's getting big, I think it's good to document.

I would like to suggest this and document it in our guideline:

1. Add a prefix into a test name when a PR adds a couple of tests.
2. Uses "SPARK-XXXX: test name" format which is used in our code base most
      often[1].

We should make it simple and clear but closer to the actual practice. So, I would like to listen to what other people think. I would appreciate if you guys give some feedback about when to add the JIRA prefix. One alternative is that, we only add the prefix when the JIRA's type is bug.

[1]
git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
     923
git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
     477
git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
      16
git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
      13





--
---
Takeshi Yamamuro
Reply | Threaded
Open this post in threaded view
|

Re: Adding JIRA ID as the prefix for the test case name

keypoint
+1

Two confusions to clarify:
1. what if multiple JIRA IDs relating to the same test? we just take the very first JIRA ID?
2. are we going to have a full scan of all existing tests and attach a JIRA ID to it?

Thank you Hyukjin :)

On Tue, Nov 12, 2019 at 1:47 PM Dongjoon Hyun <[hidden email]> wrote:
Thank you for the suggestion, Hyukjin.

Previously, we added Jira IDs for the bug fix PR test cases as Gabor said.

For the new features (and improvements), we didn't add them

because all test cases in the newly added test suite share the same prefix JIRA ID in that case.

It might looks redundant.

However, I'm +1 for Hyukjin's original suggestion because we had better have the official rule for this in some ways.

Thank you again, Hyukjin.

Bests,
Dongjoon.



On Tue, Nov 12, 2019 at 1:13 AM Gabor Somogyi <[hidden email]> wrote:
+1 for having that consistent rule in test names.
+1 for making it a guideline.
+1 defining exact guides in general.

Until now I've followed the alternative (only add the prefix when the JIRA's type is bug) and that way I knew that such tests contain edge cases.
In case of new features I'm pretty sure there is a reason to introduce it but at the moment can't imagine a use-case where it can help us (want to convert it to daily routine).

> This is helpful when the test cases are moved to a different file.
The test can be found by name without jira ID


On Tue, Nov 12, 2019 at 5:31 AM Hyukjin Kwon <[hidden email]> wrote:
In few days, I will wrote this in our guidelines probably after rewording it a bit better:

1. Add a prefix into a test name when a PR adds a couple of tests.
2. Uses "SPARK-XXXX: test name" format.

Please let me know if you have any different opinion about what/when to write the JIRA ID as the prefix.
I would like to make sure this simple rule is closer to the actual practice from you guys.


2019년 11월 12일 (화) 오전 8:41, Gengliang <[hidden email]>님이 작성:
+1 for making it a guideline. This is helpful when the test cases are moved to a different file.

On Mon, Nov 11, 2019 at 3:23 PM Takeshi Yamamuro <[hidden email]> wrote:
+1 for having that consistent rule in test names.
This is a trivial problem though, I think documenting this rule in the contribution guide
might be able to make reviewer overhead a little smaller.

Bests,
Takeshi

On Tue, Nov 12, 2019 at 1:46 AM Hyukjin Kwon <[hidden email]> wrote:
Hi all,

Maybe it's not a big deal but it brought some confusions time to time into Spark dev and community. I think it's time to discuss about when/which format to add a JIRA ID as a prefix for the test case name in Scala test cases.

Currently we have many test case names with prefixes as below:
  • test("SPARK-XXXXX blah blah")
  • test("SPARK-XXXXX: blah blah")
  • test("SPARK-XXXXX - blah blah")
  • test("[SPARK-XXXXX] blah blah")
It is a good practice to have the JIRA ID in general because, for instance,
it makes us put less efforts to track commit histories (or even when the files
are totally moved), or to track related information of tests failed.
Considering Spark's getting big, I think it's good to document.

I would like to suggest this and document it in our guideline:

1. Add a prefix into a test name when a PR adds a couple of tests.
2. Uses "SPARK-XXXX: test name" format which is used in our code base most
      often[1].

We should make it simple and clear but closer to the actual practice. So, I would like to listen to what other people think. I would appreciate if you guys give some feedback about when to add the JIRA prefix. One alternative is that, we only add the prefix when the JIRA's type is bug.

[1]
git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
     923
git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
     477
git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
      16
git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
      13





--
---
Takeshi Yamamuro
Reply | Threaded
Open this post in threaded view
|

Re: Adding JIRA ID as the prefix for the test case name

Sean Owen-2
In reply to this post by Hyukjin Kwon
Let's suggest "SPARK-12345:" but not go back and change a bunch of test cases.
I'd add this only when a test specifically targets a certain issue.
It's a nice-to-have, not super essential, just because in the rare
case you need to understand why a test asserts something, you can go
back and find what added it in the git history without much trouble.

On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon <[hidden email]> wrote:

>
> Hi all,
>
> Maybe it's not a big deal but it brought some confusions time to time into Spark dev and community. I think it's time to discuss about when/which format to add a JIRA ID as a prefix for the test case name in Scala test cases.
>
> Currently we have many test case names with prefixes as below:
>
> test("SPARK-XXXXX blah blah")
> test("SPARK-XXXXX: blah blah")
> test("SPARK-XXXXX - blah blah")
> test("[SPARK-XXXXX] blah blah")
> …
>
> It is a good practice to have the JIRA ID in general because, for instance,
> it makes us put less efforts to track commit histories (or even when the files
> are totally moved), or to track related information of tests failed.
> Considering Spark's getting big, I think it's good to document.
>
> I would like to suggest this and document it in our guideline:
>
> 1. Add a prefix into a test name when a PR adds a couple of tests.
> 2. Uses "SPARK-XXXX: test name" format which is used in our code base most
>       often[1].
>
> We should make it simple and clear but closer to the actual practice. So, I would like to listen to what other people think. I would appreciate if you guys give some feedback about when to add the JIRA prefix. One alternative is that, we only add the prefix when the JIRA's type is bug.
>
> [1]
> git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
>      923
> git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
>      477
> git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
>       16
> git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
>       13
>
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Adding JIRA ID as the prefix for the test case name

Hyukjin Kwon
> In general a test should be self descriptive and I don't think we should be adding JIRA ticket references wholesale. Any action that the reader has to take to understand why a test was introduced is one too many. However in some cases the thing we are trying to test is very subtle and in that case a reference to a JIRA ticket might be useful, I do still feel that this should be a backstop and that properly documenting your tests is a much better way of dealing with this.

Yeah, the test should be self-descriptive. I don't think adding a JIRA prefix harms this point. Probably I should add this sentence in the guidelines as well.
Adding a JIRA prefix just adds one extra hint to track down details. I think it's fine to stick to this practice and make it simpler and clear to follow.

> 1. what if multiple JIRA IDs relating to the same test? we just take the very first JIRA ID?
Ideally one JIRA should describe one issue and one PR should fix one JIRA with a dedicated test.
Yeah, I think I would take the very first JIRA ID.

> 2. are we going to have a full scan of all existing tests and attach a JIRA ID to it?
Yea, let's don't do this.

> It's a nice-to-have, not super essential, just because ...
It's been asked multiple times and each committer seems having a different understanding on this.
It's not a biggie but wanted to make it clear and conclude this.

> I'd add this only when a test specifically targets a certain issue.
Yes, so this one I am not sure. From what I heard, people adds the JIRA in cases below:

- Whenever the JIRA type is a bug
- When a PR adds a couple of tests
- Only when a test specifically targets a certain issue.
- ...

Which one do we prefer and simpler to follow? 

Or I can combine as below (im gonna reword when I actually document this):
1. In general, we should add a JIRA ID as prefix of a test when a PR targets to fix a specific issue.
    In practice, it usually happens when a JIRA type is a bug or a PR adds a couple of tests.
2. Uses "SPARK-XXXX: test name" format

If we have no objection with ^, let me go with this.

2019년 11월 13일 (수) 오전 8:14, Sean Owen <[hidden email]>님이 작성:
Let's suggest "SPARK-12345:" but not go back and change a bunch of test cases.
I'd add this only when a test specifically targets a certain issue.
It's a nice-to-have, not super essential, just because in the rare
case you need to understand why a test asserts something, you can go
back and find what added it in the git history without much trouble.

On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon <[hidden email]> wrote:
>
> Hi all,
>
> Maybe it's not a big deal but it brought some confusions time to time into Spark dev and community. I think it's time to discuss about when/which format to add a JIRA ID as a prefix for the test case name in Scala test cases.
>
> Currently we have many test case names with prefixes as below:
>
> test("SPARK-XXXXX blah blah")
> test("SPARK-XXXXX: blah blah")
> test("SPARK-XXXXX - blah blah")
> test("[SPARK-XXXXX] blah blah")
> …
>
> It is a good practice to have the JIRA ID in general because, for instance,
> it makes us put less efforts to track commit histories (or even when the files
> are totally moved), or to track related information of tests failed.
> Considering Spark's getting big, I think it's good to document.
>
> I would like to suggest this and document it in our guideline:
>
> 1. Add a prefix into a test name when a PR adds a couple of tests.
> 2. Uses "SPARK-XXXX: test name" format which is used in our code base most
>       often[1].
>
> We should make it simple and clear but closer to the actual practice. So, I would like to listen to what other people think. I would appreciate if you guys give some feedback about when to add the JIRA prefix. One alternative is that, we only add the prefix when the JIRA's type is bug.
>
> [1]
> git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
>      923
> git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
>      477
> git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
>       16
> git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
>       13
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding JIRA ID as the prefix for the test case name

Hyukjin Kwon

2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon <[hidden email]>님이 작성:
> In general a test should be self descriptive and I don't think we should be adding JIRA ticket references wholesale. Any action that the reader has to take to understand why a test was introduced is one too many. However in some cases the thing we are trying to test is very subtle and in that case a reference to a JIRA ticket might be useful, I do still feel that this should be a backstop and that properly documenting your tests is a much better way of dealing with this.

Yeah, the test should be self-descriptive. I don't think adding a JIRA prefix harms this point. Probably I should add this sentence in the guidelines as well.
Adding a JIRA prefix just adds one extra hint to track down details. I think it's fine to stick to this practice and make it simpler and clear to follow.

> 1. what if multiple JIRA IDs relating to the same test? we just take the very first JIRA ID?
Ideally one JIRA should describe one issue and one PR should fix one JIRA with a dedicated test.
Yeah, I think I would take the very first JIRA ID.

> 2. are we going to have a full scan of all existing tests and attach a JIRA ID to it?
Yea, let's don't do this.

> It's a nice-to-have, not super essential, just because ...
It's been asked multiple times and each committer seems having a different understanding on this.
It's not a biggie but wanted to make it clear and conclude this.

> I'd add this only when a test specifically targets a certain issue.
Yes, so this one I am not sure. From what I heard, people adds the JIRA in cases below:

- Whenever the JIRA type is a bug
- When a PR adds a couple of tests
- Only when a test specifically targets a certain issue.
- ...

Which one do we prefer and simpler to follow? 

Or I can combine as below (im gonna reword when I actually document this):
1. In general, we should add a JIRA ID as prefix of a test when a PR targets to fix a specific issue.
    In practice, it usually happens when a JIRA type is a bug or a PR adds a couple of tests.
2. Uses "SPARK-XXXX: test name" format

If we have no objection with ^, let me go with this.

2019년 11월 13일 (수) 오전 8:14, Sean Owen <[hidden email]>님이 작성:
Let's suggest "SPARK-12345:" but not go back and change a bunch of test cases.
I'd add this only when a test specifically targets a certain issue.
It's a nice-to-have, not super essential, just because in the rare
case you need to understand why a test asserts something, you can go
back and find what added it in the git history without much trouble.

On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon <[hidden email]> wrote:
>
> Hi all,
>
> Maybe it's not a big deal but it brought some confusions time to time into Spark dev and community. I think it's time to discuss about when/which format to add a JIRA ID as a prefix for the test case name in Scala test cases.
>
> Currently we have many test case names with prefixes as below:
>
> test("SPARK-XXXXX blah blah")
> test("SPARK-XXXXX: blah blah")
> test("SPARK-XXXXX - blah blah")
> test("[SPARK-XXXXX] blah blah")
> …
>
> It is a good practice to have the JIRA ID in general because, for instance,
> it makes us put less efforts to track commit histories (or even when the files
> are totally moved), or to track related information of tests failed.
> Considering Spark's getting big, I think it's good to document.
>
> I would like to suggest this and document it in our guideline:
>
> 1. Add a prefix into a test name when a PR adds a couple of tests.
> 2. Uses "SPARK-XXXX: test name" format which is used in our code base most
>       often[1].
>
> We should make it simple and clear but closer to the actual practice. So, I would like to listen to what other people think. I would appreciate if you guys give some feedback about when to add the JIRA prefix. One alternative is that, we only add the prefix when the JIRA's type is bug.
>
> [1]
> git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
>      923
> git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
>      477
> git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
>       16
> git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
>       13
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding JIRA ID as the prefix for the test case name

Shixiong(Ryan) Zhu
Should we also add a guideline for non Scala tests? Other languages (Java, Python, R) don't support using string as a test name.

Best Regards,

Ryan


On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon <[hidden email]> wrote:

2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon <[hidden email]>님이 작성:
> In general a test should be self descriptive and I don't think we should be adding JIRA ticket references wholesale. Any action that the reader has to take to understand why a test was introduced is one too many. However in some cases the thing we are trying to test is very subtle and in that case a reference to a JIRA ticket might be useful, I do still feel that this should be a backstop and that properly documenting your tests is a much better way of dealing with this.

Yeah, the test should be self-descriptive. I don't think adding a JIRA prefix harms this point. Probably I should add this sentence in the guidelines as well.
Adding a JIRA prefix just adds one extra hint to track down details. I think it's fine to stick to this practice and make it simpler and clear to follow.

> 1. what if multiple JIRA IDs relating to the same test? we just take the very first JIRA ID?
Ideally one JIRA should describe one issue and one PR should fix one JIRA with a dedicated test.
Yeah, I think I would take the very first JIRA ID.

> 2. are we going to have a full scan of all existing tests and attach a JIRA ID to it?
Yea, let's don't do this.

> It's a nice-to-have, not super essential, just because ...
It's been asked multiple times and each committer seems having a different understanding on this.
It's not a biggie but wanted to make it clear and conclude this.

> I'd add this only when a test specifically targets a certain issue.
Yes, so this one I am not sure. From what I heard, people adds the JIRA in cases below:

- Whenever the JIRA type is a bug
- When a PR adds a couple of tests
- Only when a test specifically targets a certain issue.
- ...

Which one do we prefer and simpler to follow? 

Or I can combine as below (im gonna reword when I actually document this):
1. In general, we should add a JIRA ID as prefix of a test when a PR targets to fix a specific issue.
    In practice, it usually happens when a JIRA type is a bug or a PR adds a couple of tests.
2. Uses "SPARK-XXXX: test name" format

If we have no objection with ^, let me go with this.

2019년 11월 13일 (수) 오전 8:14, Sean Owen <[hidden email]>님이 작성:
Let's suggest "SPARK-12345:" but not go back and change a bunch of test cases.
I'd add this only when a test specifically targets a certain issue.
It's a nice-to-have, not super essential, just because in the rare
case you need to understand why a test asserts something, you can go
back and find what added it in the git history without much trouble.

On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon <[hidden email]> wrote:
>
> Hi all,
>
> Maybe it's not a big deal but it brought some confusions time to time into Spark dev and community. I think it's time to discuss about when/which format to add a JIRA ID as a prefix for the test case name in Scala test cases.
>
> Currently we have many test case names with prefixes as below:
>
> test("SPARK-XXXXX blah blah")
> test("SPARK-XXXXX: blah blah")
> test("SPARK-XXXXX - blah blah")
> test("[SPARK-XXXXX] blah blah")
> …
>
> It is a good practice to have the JIRA ID in general because, for instance,
> it makes us put less efforts to track commit histories (or even when the files
> are totally moved), or to track related information of tests failed.
> Considering Spark's getting big, I think it's good to document.
>
> I would like to suggest this and document it in our guideline:
>
> 1. Add a prefix into a test name when a PR adds a couple of tests.
> 2. Uses "SPARK-XXXX: test name" format which is used in our code base most
>       often[1].
>
> We should make it simple and clear but closer to the actual practice. So, I would like to listen to what other people think. I would appreciate if you guys give some feedback about when to add the JIRA prefix. One alternative is that, we only add the prefix when the JIRA's type is bug.
>
> [1]
> git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
>      923
> git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
>      477
> git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
>       16
> git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
>       13
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding JIRA ID as the prefix for the test case name

Hyukjin Kwon
Yeah, sounds good to have it.

In case of R, it seems not quite common to write down JIRA ID [1] but looks some have the prefix in its test name in general.
In case of Python and Java, seems we time to time write a JIRA ID in the comment right under the test method [2][3].

Given this pattern, I would like to suggest use the same format but:

1. For Python and Java, write a single comment that starts with JIRA ID and short description, e.g. (SPARK-XXXXX: test blah blah)
2. For R, use JIRA ID as a prefix for its test name.

[1] git grep -r "SPARK-" -- '*test*.R'
[2] git grep -r "SPARK-" -- '*Suite.java'
[3] git grep -r "SPARK-" -- '*test*.py'

Does that make sense? Adding Felix and Shivaram too.


2019년 11월 15일 (금) 오전 3:13, Shixiong(Ryan) Zhu <[hidden email]>님이 작성:
Should we also add a guideline for non Scala tests? Other languages (Java, Python, R) don't support using string as a test name.

Best Regards,

Ryan


On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon <[hidden email]> wrote:

2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon <[hidden email]>님이 작성:
> In general a test should be self descriptive and I don't think we should be adding JIRA ticket references wholesale. Any action that the reader has to take to understand why a test was introduced is one too many. However in some cases the thing we are trying to test is very subtle and in that case a reference to a JIRA ticket might be useful, I do still feel that this should be a backstop and that properly documenting your tests is a much better way of dealing with this.

Yeah, the test should be self-descriptive. I don't think adding a JIRA prefix harms this point. Probably I should add this sentence in the guidelines as well.
Adding a JIRA prefix just adds one extra hint to track down details. I think it's fine to stick to this practice and make it simpler and clear to follow.

> 1. what if multiple JIRA IDs relating to the same test? we just take the very first JIRA ID?
Ideally one JIRA should describe one issue and one PR should fix one JIRA with a dedicated test.
Yeah, I think I would take the very first JIRA ID.

> 2. are we going to have a full scan of all existing tests and attach a JIRA ID to it?
Yea, let's don't do this.

> It's a nice-to-have, not super essential, just because ...
It's been asked multiple times and each committer seems having a different understanding on this.
It's not a biggie but wanted to make it clear and conclude this.

> I'd add this only when a test specifically targets a certain issue.
Yes, so this one I am not sure. From what I heard, people adds the JIRA in cases below:

- Whenever the JIRA type is a bug
- When a PR adds a couple of tests
- Only when a test specifically targets a certain issue.
- ...

Which one do we prefer and simpler to follow? 

Or I can combine as below (im gonna reword when I actually document this):
1. In general, we should add a JIRA ID as prefix of a test when a PR targets to fix a specific issue.
    In practice, it usually happens when a JIRA type is a bug or a PR adds a couple of tests.
2. Uses "SPARK-XXXX: test name" format

If we have no objection with ^, let me go with this.

2019년 11월 13일 (수) 오전 8:14, Sean Owen <[hidden email]>님이 작성:
Let's suggest "SPARK-12345:" but not go back and change a bunch of test cases.
I'd add this only when a test specifically targets a certain issue.
It's a nice-to-have, not super essential, just because in the rare
case you need to understand why a test asserts something, you can go
back and find what added it in the git history without much trouble.

On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon <[hidden email]> wrote:
>
> Hi all,
>
> Maybe it's not a big deal but it brought some confusions time to time into Spark dev and community. I think it's time to discuss about when/which format to add a JIRA ID as a prefix for the test case name in Scala test cases.
>
> Currently we have many test case names with prefixes as below:
>
> test("SPARK-XXXXX blah blah")
> test("SPARK-XXXXX: blah blah")
> test("SPARK-XXXXX - blah blah")
> test("[SPARK-XXXXX] blah blah")
> …
>
> It is a good practice to have the JIRA ID in general because, for instance,
> it makes us put less efforts to track commit histories (or even when the files
> are totally moved), or to track related information of tests failed.
> Considering Spark's getting big, I think it's good to document.
>
> I would like to suggest this and document it in our guideline:
>
> 1. Add a prefix into a test name when a PR adds a couple of tests.
> 2. Uses "SPARK-XXXX: test name" format which is used in our code base most
>       often[1].
>
> We should make it simple and clear but closer to the actual practice. So, I would like to listen to what other people think. I would appreciate if you guys give some feedback about when to add the JIRA prefix. One alternative is that, we only add the prefix when the JIRA's type is bug.
>
> [1]
> git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
>      923
> git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
>      477
> git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
>       16
> git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
>       13
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding JIRA ID as the prefix for the test case name

Felix Cheung
this is about test description and not test file name right?

if yes I don’t see a problem.


From: Hyukjin Kwon <[hidden email]>
Sent: Thursday, November 14, 2019 6:03:02 PM
To: Shixiong(Ryan) Zhu <[hidden email]>
Cc: dev <[hidden email]>; Felix Cheung <[hidden email]>; Shivaram Venkataraman <[hidden email]>
Subject: Re: Adding JIRA ID as the prefix for the test case name
 
Yeah, sounds good to have it.

In case of R, it seems not quite common to write down JIRA ID [1] but looks some have the prefix in its test name in general.
In case of Python and Java, seems we time to time write a JIRA ID in the comment right under the test method [2][3].

Given this pattern, I would like to suggest use the same format but:

1. For Python and Java, write a single comment that starts with JIRA ID and short description, e.g. (SPARK-XXXXX: test blah blah)
2. For R, use JIRA ID as a prefix for its test name.

[1] git grep -r "SPARK-" -- '*test*.R'
[2] git grep -r "SPARK-" -- '*Suite.java'
[3] git grep -r "SPARK-" -- '*test*.py'

Does that make sense? Adding Felix and Shivaram too.


2019년 11월 15일 (금) 오전 3:13, Shixiong(Ryan) Zhu <[hidden email]>님이 작성:
Should we also add a guideline for non Scala tests? Other languages (Java, Python, R) don't support using string as a test name.

Best Regards,

Ryan


On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon <[hidden email]> wrote:

2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon <[hidden email]>님이 작성:
> In general a test should be self descriptive and I don't think we should be adding JIRA ticket references wholesale. Any action that the reader has to take to understand why a test was introduced is one too many. However in some cases the thing we are trying to test is very subtle and in that case a reference to a JIRA ticket might be useful, I do still feel that this should be a backstop and that properly documenting your tests is a much better way of dealing with this.

Yeah, the test should be self-descriptive. I don't think adding a JIRA prefix harms this point. Probably I should add this sentence in the guidelines as well.
Adding a JIRA prefix just adds one extra hint to track down details. I think it's fine to stick to this practice and make it simpler and clear to follow.

> 1. what if multiple JIRA IDs relating to the same test? we just take the very first JIRA ID?
Ideally one JIRA should describe one issue and one PR should fix one JIRA with a dedicated test.
Yeah, I think I would take the very first JIRA ID.

> 2. are we going to have a full scan of all existing tests and attach a JIRA ID to it?
Yea, let's don't do this.

> It's a nice-to-have, not super essential, just because ...
It's been asked multiple times and each committer seems having a different understanding on this.
It's not a biggie but wanted to make it clear and conclude this.

> I'd add this only when a test specifically targets a certain issue.
Yes, so this one I am not sure. From what I heard, people adds the JIRA in cases below:

- Whenever the JIRA type is a bug
- When a PR adds a couple of tests
- Only when a test specifically targets a certain issue.
- ...

Which one do we prefer and simpler to follow? 

Or I can combine as below (im gonna reword when I actually document this):
1. In general, we should add a JIRA ID as prefix of a test when a PR targets to fix a specific issue.
    In practice, it usually happens when a JIRA type is a bug or a PR adds a couple of tests.
2. Uses "SPARK-XXXX: test name" format

If we have no objection with ^, let me go with this.

2019년 11월 13일 (수) 오전 8:14, Sean Owen <[hidden email]>님이 작성:
Let's suggest "SPARK-12345:" but not go back and change a bunch of test cases.
I'd add this only when a test specifically targets a certain issue.
It's a nice-to-have, not super essential, just because in the rare
case you need to understand why a test asserts something, you can go
back and find what added it in the git history without much trouble.

On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon <[hidden email]> wrote:
>
> Hi all,
>
> Maybe it's not a big deal but it brought some confusions time to time into Spark dev and community. I think it's time to discuss about when/which format to add a JIRA ID as a prefix for the test case name in Scala test cases.
>
> Currently we have many test case names with prefixes as below:
>
> test("SPARK-XXXXX blah blah")
> test("SPARK-XXXXX: blah blah")
> test("SPARK-XXXXX - blah blah")
> test("[SPARK-XXXXX] blah blah")
> …
>
> It is a good practice to have the JIRA ID in general because, for instance,
> it makes us put less efforts to track commit histories (or even when the files
> are totally moved), or to track related information of tests failed.
> Considering Spark's getting big, I think it's good to document.
>
> I would like to suggest this and document it in our guideline:
>
> 1. Add a prefix into a test name when a PR adds a couple of tests.
> 2. Uses "SPARK-XXXX: test name" format which is used in our code base most
>       often[1].
>
> We should make it simple and clear but closer to the actual practice. So, I would like to listen to what other people think. I would appreciate if you guys give some feedback about when to add the JIRA prefix. One alternative is that, we only add the prefix when the JIRA's type is bug.
>
> [1]
> git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
>      923
> git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
>      477
> git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
>       16
> git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
>       13
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding JIRA ID as the prefix for the test case name

Steve Loughran-2
In reply to this post by Shixiong(Ryan) Zhu
 Junit5: Display names. 

Goes all the way to the XML.


On Thu, Nov 14, 2019 at 6:13 PM Shixiong(Ryan) Zhu <[hidden email]> wrote:
Should we also add a guideline for non Scala tests? Other languages (Java, Python, R) don't support using string as a test name.

Best Regards,

Ryan


On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon <[hidden email]> wrote:

2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon <[hidden email]>님이 작성:
> In general a test should be self descriptive and I don't think we should be adding JIRA ticket references wholesale. Any action that the reader has to take to understand why a test was introduced is one too many. However in some cases the thing we are trying to test is very subtle and in that case a reference to a JIRA ticket might be useful, I do still feel that this should be a backstop and that properly documenting your tests is a much better way of dealing with this.

Yeah, the test should be self-descriptive. I don't think adding a JIRA prefix harms this point. Probably I should add this sentence in the guidelines as well.
Adding a JIRA prefix just adds one extra hint to track down details. I think it's fine to stick to this practice and make it simpler and clear to follow.

> 1. what if multiple JIRA IDs relating to the same test? we just take the very first JIRA ID?
Ideally one JIRA should describe one issue and one PR should fix one JIRA with a dedicated test.
Yeah, I think I would take the very first JIRA ID.

> 2. are we going to have a full scan of all existing tests and attach a JIRA ID to it?
Yea, let's don't do this.

> It's a nice-to-have, not super essential, just because ...
It's been asked multiple times and each committer seems having a different understanding on this.
It's not a biggie but wanted to make it clear and conclude this.

> I'd add this only when a test specifically targets a certain issue.
Yes, so this one I am not sure. From what I heard, people adds the JIRA in cases below:

- Whenever the JIRA type is a bug
- When a PR adds a couple of tests
- Only when a test specifically targets a certain issue.
- ...

Which one do we prefer and simpler to follow? 

Or I can combine as below (im gonna reword when I actually document this):
1. In general, we should add a JIRA ID as prefix of a test when a PR targets to fix a specific issue.
    In practice, it usually happens when a JIRA type is a bug or a PR adds a couple of tests.
2. Uses "SPARK-XXXX: test name" format

If we have no objection with ^, let me go with this.

2019년 11월 13일 (수) 오전 8:14, Sean Owen <[hidden email]>님이 작성:
Let's suggest "SPARK-12345:" but not go back and change a bunch of test cases.
I'd add this only when a test specifically targets a certain issue.
It's a nice-to-have, not super essential, just because in the rare
case you need to understand why a test asserts something, you can go
back and find what added it in the git history without much trouble.

On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon <[hidden email]> wrote:
>
> Hi all,
>
> Maybe it's not a big deal but it brought some confusions time to time into Spark dev and community. I think it's time to discuss about when/which format to add a JIRA ID as a prefix for the test case name in Scala test cases.
>
> Currently we have many test case names with prefixes as below:
>
> test("SPARK-XXXXX blah blah")
> test("SPARK-XXXXX: blah blah")
> test("SPARK-XXXXX - blah blah")
> test("[SPARK-XXXXX] blah blah")
> …
>
> It is a good practice to have the JIRA ID in general because, for instance,
> it makes us put less efforts to track commit histories (or even when the files
> are totally moved), or to track related information of tests failed.
> Considering Spark's getting big, I think it's good to document.
>
> I would like to suggest this and document it in our guideline:
>
> 1. Add a prefix into a test name when a PR adds a couple of tests.
> 2. Uses "SPARK-XXXX: test name" format which is used in our code base most
>       often[1].
>
> We should make it simple and clear but closer to the actual practice. So, I would like to listen to what other people think. I would appreciate if you guys give some feedback about when to add the JIRA prefix. One alternative is that, we only add the prefix when the JIRA's type is bug.
>
> [1]
> git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
>      923
> git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
>      477
> git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
>       16
> git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
>       13
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding JIRA ID as the prefix for the test case name

Hyukjin Kwon
DisplayName looks good in general but actually here I would like first to find a existing pattern to document in guidelines given the actual existing practice we all are used to. I'm trying to be very conservative since this guidelines affect everybody.

I think it might be better to discuss separately if we want to change what we have been used to.

Also, using arbitrary names might not be actually free due to such bug like https://github.com/apache/spark/pull/25630 . It will need some more efforts to investigate as well.

On Fri, 15 Nov 2019, 20:56 Steve Loughran, <[hidden email]> wrote:
 Junit5: Display names. 

Goes all the way to the XML.


On Thu, Nov 14, 2019 at 6:13 PM Shixiong(Ryan) Zhu <[hidden email]> wrote:
Should we also add a guideline for non Scala tests? Other languages (Java, Python, R) don't support using string as a test name.

Best Regards,

Ryan


On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon <[hidden email]> wrote:

2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon <[hidden email]>님이 작성:
> In general a test should be self descriptive and I don't think we should be adding JIRA ticket references wholesale. Any action that the reader has to take to understand why a test was introduced is one too many. However in some cases the thing we are trying to test is very subtle and in that case a reference to a JIRA ticket might be useful, I do still feel that this should be a backstop and that properly documenting your tests is a much better way of dealing with this.

Yeah, the test should be self-descriptive. I don't think adding a JIRA prefix harms this point. Probably I should add this sentence in the guidelines as well.
Adding a JIRA prefix just adds one extra hint to track down details. I think it's fine to stick to this practice and make it simpler and clear to follow.

> 1. what if multiple JIRA IDs relating to the same test? we just take the very first JIRA ID?
Ideally one JIRA should describe one issue and one PR should fix one JIRA with a dedicated test.
Yeah, I think I would take the very first JIRA ID.

> 2. are we going to have a full scan of all existing tests and attach a JIRA ID to it?
Yea, let's don't do this.

> It's a nice-to-have, not super essential, just because ...
It's been asked multiple times and each committer seems having a different understanding on this.
It's not a biggie but wanted to make it clear and conclude this.

> I'd add this only when a test specifically targets a certain issue.
Yes, so this one I am not sure. From what I heard, people adds the JIRA in cases below:

- Whenever the JIRA type is a bug
- When a PR adds a couple of tests
- Only when a test specifically targets a certain issue.
- ...

Which one do we prefer and simpler to follow? 

Or I can combine as below (im gonna reword when I actually document this):
1. In general, we should add a JIRA ID as prefix of a test when a PR targets to fix a specific issue.
    In practice, it usually happens when a JIRA type is a bug or a PR adds a couple of tests.
2. Uses "SPARK-XXXX: test name" format

If we have no objection with ^, let me go with this.

2019년 11월 13일 (수) 오전 8:14, Sean Owen <[hidden email]>님이 작성:
Let's suggest "SPARK-12345:" but not go back and change a bunch of test cases.
I'd add this only when a test specifically targets a certain issue.
It's a nice-to-have, not super essential, just because in the rare
case you need to understand why a test asserts something, you can go
back and find what added it in the git history without much trouble.

On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon <[hidden email]> wrote:
>
> Hi all,
>
> Maybe it's not a big deal but it brought some confusions time to time into Spark dev and community. I think it's time to discuss about when/which format to add a JIRA ID as a prefix for the test case name in Scala test cases.
>
> Currently we have many test case names with prefixes as below:
>
> test("SPARK-XXXXX blah blah")
> test("SPARK-XXXXX: blah blah")
> test("SPARK-XXXXX - blah blah")
> test("[SPARK-XXXXX] blah blah")
> …
>
> It is a good practice to have the JIRA ID in general because, for instance,
> it makes us put less efforts to track commit histories (or even when the files
> are totally moved), or to track related information of tests failed.
> Considering Spark's getting big, I think it's good to document.
>
> I would like to suggest this and document it in our guideline:
>
> 1. Add a prefix into a test name when a PR adds a couple of tests.
> 2. Uses "SPARK-XXXX: test name" format which is used in our code base most
>       often[1].
>
> We should make it simple and clear but closer to the actual practice. So, I would like to listen to what other people think. I would appreciate if you guys give some feedback about when to add the JIRA prefix. One alternative is that, we only add the prefix when the JIRA's type is bug.
>
> [1]
> git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
>      923
> git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
>      477
> git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
>       16
> git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
>       13
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding JIRA ID as the prefix for the test case name

Steve Loughran-2
Test reporters do often contain some assumptions about the characters in the test methods. Historically JUnit XML reporters have never sanitised the method names so XML injection attacks have been fairly trivial. Haven't tried this for a while.

That whole JUnit XML report "standard" was actually put together in the Ant project with <Junitreport> doing the postprocessing of the JUnit run. It was driven by the team's XSL skills than any overreaching strategic goal about how to present test results of tests which could run for hours and whose output you would really want to aggregate the locks from multiple machines and processes and present in awake you can actually navigate. With hindsight, a key failing is that we chose to store the test summaries (test count, failure count...) as attributes on the root XML mode. Which is why the whole DOM gets built up in the JUnit runner. Which is why when that JUnit process crashes, you get no report at all.
 
It'd be straightforward to fix -except too much relies on that file now...important things will break. And the maven runner has historically never supported custom reporters, to let you experiment with it.

Maybe this is an opportunity to change things.

On Sun, Nov 17, 2019 at 1:42 AM Hyukjin Kwon <[hidden email]> wrote:
DisplayName looks good in general but actually here I would like first to find a existing pattern to document in guidelines given the actual existing practice we all are used to. I'm trying to be very conservative since this guidelines affect everybody.

I think it might be better to discuss separately if we want to change what we have been used to.

Also, using arbitrary names might not be actually free due to such bug like https://github.com/apache/spark/pull/25630 . It will need some more efforts to investigate as well.

On Fri, 15 Nov 2019, 20:56 Steve Loughran, <[hidden email]> wrote:
 Junit5: Display names. 

Goes all the way to the XML.


On Thu, Nov 14, 2019 at 6:13 PM Shixiong(Ryan) Zhu <[hidden email]> wrote:
Should we also add a guideline for non Scala tests? Other languages (Java, Python, R) don't support using string as a test name.

Best Regards,

Ryan


On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon <[hidden email]> wrote:

2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon <[hidden email]>님이 작성:
> In general a test should be self descriptive and I don't think we should be adding JIRA ticket references wholesale. Any action that the reader has to take to understand why a test was introduced is one too many. However in some cases the thing we are trying to test is very subtle and in that case a reference to a JIRA ticket might be useful, I do still feel that this should be a backstop and that properly documenting your tests is a much better way of dealing with this.

Yeah, the test should be self-descriptive. I don't think adding a JIRA prefix harms this point. Probably I should add this sentence in the guidelines as well.
Adding a JIRA prefix just adds one extra hint to track down details. I think it's fine to stick to this practice and make it simpler and clear to follow.

> 1. what if multiple JIRA IDs relating to the same test? we just take the very first JIRA ID?
Ideally one JIRA should describe one issue and one PR should fix one JIRA with a dedicated test.
Yeah, I think I would take the very first JIRA ID.

> 2. are we going to have a full scan of all existing tests and attach a JIRA ID to it?
Yea, let's don't do this.

> It's a nice-to-have, not super essential, just because ...
It's been asked multiple times and each committer seems having a different understanding on this.
It's not a biggie but wanted to make it clear and conclude this.

> I'd add this only when a test specifically targets a certain issue.
Yes, so this one I am not sure. From what I heard, people adds the JIRA in cases below:

- Whenever the JIRA type is a bug
- When a PR adds a couple of tests
- Only when a test specifically targets a certain issue.
- ...

Which one do we prefer and simpler to follow? 

Or I can combine as below (im gonna reword when I actually document this):
1. In general, we should add a JIRA ID as prefix of a test when a PR targets to fix a specific issue.
    In practice, it usually happens when a JIRA type is a bug or a PR adds a couple of tests.
2. Uses "SPARK-XXXX: test name" format

If we have no objection with ^, let me go with this.

2019년 11월 13일 (수) 오전 8:14, Sean Owen <[hidden email]>님이 작성:
Let's suggest "SPARK-12345:" but not go back and change a bunch of test cases.
I'd add this only when a test specifically targets a certain issue.
It's a nice-to-have, not super essential, just because in the rare
case you need to understand why a test asserts something, you can go
back and find what added it in the git history without much trouble.

On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon <[hidden email]> wrote:
>
> Hi all,
>
> Maybe it's not a big deal but it brought some confusions time to time into Spark dev and community. I think it's time to discuss about when/which format to add a JIRA ID as a prefix for the test case name in Scala test cases.
>
> Currently we have many test case names with prefixes as below:
>
> test("SPARK-XXXXX blah blah")
> test("SPARK-XXXXX: blah blah")
> test("SPARK-XXXXX - blah blah")
> test("[SPARK-XXXXX] blah blah")
> …
>
> It is a good practice to have the JIRA ID in general because, for instance,
> it makes us put less efforts to track commit histories (or even when the files
> are totally moved), or to track related information of tests failed.
> Considering Spark's getting big, I think it's good to document.
>
> I would like to suggest this and document it in our guideline:
>
> 1. Add a prefix into a test name when a PR adds a couple of tests.
> 2. Uses "SPARK-XXXX: test name" format which is used in our code base most
>       often[1].
>
> We should make it simple and clear but closer to the actual practice. So, I would like to listen to what other people think. I would appreciate if you guys give some feedback about when to add the JIRA prefix. One alternative is that, we only add the prefix when the JIRA's type is bug.
>
> [1]
> git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
>      923
> git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
>      477
> git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
>       16
> git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
>       13
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding JIRA ID as the prefix for the test case name

Hyukjin Kwon
Actually there are not so many Java test cases in Spark (because Scala runs on JVM as everybody knows)[1].

Given that, I think we can avoid to put some efforts on this for now .. I don't mind if somebody wants to give a shot since it looks good anyway but to me I wouldn't spend so much time on this ..

Let me just go ahead as I suggested if you don't mind. Anyone can give a shot for Display Name - I'm willing to actively review and help.

[1]
git ls-files '*Suite.java' | wc -l
     172
git ls-files '*Suite.scala' | wc -l
    1161

2019년 11월 18일 (월) 오전 3:27, Steve Loughran <[hidden email]>님이 작성:
Test reporters do often contain some assumptions about the characters in the test methods. Historically JUnit XML reporters have never sanitised the method names so XML injection attacks have been fairly trivial. Haven't tried this for a while.

That whole JUnit XML report "standard" was actually put together in the Ant project with <Junitreport> doing the postprocessing of the JUnit run. It was driven by the team's XSL skills than any overreaching strategic goal about how to present test results of tests which could run for hours and whose output you would really want to aggregate the locks from multiple machines and processes and present in awake you can actually navigate. With hindsight, a key failing is that we chose to store the test summaries (test count, failure count...) as attributes on the root XML mode. Which is why the whole DOM gets built up in the JUnit runner. Which is why when that JUnit process crashes, you get no report at all.
 
It'd be straightforward to fix -except too much relies on that file now...important things will break. And the maven runner has historically never supported custom reporters, to let you experiment with it.

Maybe this is an opportunity to change things.

On Sun, Nov 17, 2019 at 1:42 AM Hyukjin Kwon <[hidden email]> wrote:
DisplayName looks good in general but actually here I would like first to find a existing pattern to document in guidelines given the actual existing practice we all are used to. I'm trying to be very conservative since this guidelines affect everybody.

I think it might be better to discuss separately if we want to change what we have been used to.

Also, using arbitrary names might not be actually free due to such bug like https://github.com/apache/spark/pull/25630 . It will need some more efforts to investigate as well.

On Fri, 15 Nov 2019, 20:56 Steve Loughran, <[hidden email]> wrote:
 Junit5: Display names. 

Goes all the way to the XML.


On Thu, Nov 14, 2019 at 6:13 PM Shixiong(Ryan) Zhu <[hidden email]> wrote:
Should we also add a guideline for non Scala tests? Other languages (Java, Python, R) don't support using string as a test name.

Best Regards,

Ryan


On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon <[hidden email]> wrote:

2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon <[hidden email]>님이 작성:
> In general a test should be self descriptive and I don't think we should be adding JIRA ticket references wholesale. Any action that the reader has to take to understand why a test was introduced is one too many. However in some cases the thing we are trying to test is very subtle and in that case a reference to a JIRA ticket might be useful, I do still feel that this should be a backstop and that properly documenting your tests is a much better way of dealing with this.

Yeah, the test should be self-descriptive. I don't think adding a JIRA prefix harms this point. Probably I should add this sentence in the guidelines as well.
Adding a JIRA prefix just adds one extra hint to track down details. I think it's fine to stick to this practice and make it simpler and clear to follow.

> 1. what if multiple JIRA IDs relating to the same test? we just take the very first JIRA ID?
Ideally one JIRA should describe one issue and one PR should fix one JIRA with a dedicated test.
Yeah, I think I would take the very first JIRA ID.

> 2. are we going to have a full scan of all existing tests and attach a JIRA ID to it?
Yea, let's don't do this.

> It's a nice-to-have, not super essential, just because ...
It's been asked multiple times and each committer seems having a different understanding on this.
It's not a biggie but wanted to make it clear and conclude this.

> I'd add this only when a test specifically targets a certain issue.
Yes, so this one I am not sure. From what I heard, people adds the JIRA in cases below:

- Whenever the JIRA type is a bug
- When a PR adds a couple of tests
- Only when a test specifically targets a certain issue.
- ...

Which one do we prefer and simpler to follow? 

Or I can combine as below (im gonna reword when I actually document this):
1. In general, we should add a JIRA ID as prefix of a test when a PR targets to fix a specific issue.
    In practice, it usually happens when a JIRA type is a bug or a PR adds a couple of tests.
2. Uses "SPARK-XXXX: test name" format

If we have no objection with ^, let me go with this.

2019년 11월 13일 (수) 오전 8:14, Sean Owen <[hidden email]>님이 작성:
Let's suggest "SPARK-12345:" but not go back and change a bunch of test cases.
I'd add this only when a test specifically targets a certain issue.
It's a nice-to-have, not super essential, just because in the rare
case you need to understand why a test asserts something, you can go
back and find what added it in the git history without much trouble.

On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon <[hidden email]> wrote:
>
> Hi all,
>
> Maybe it's not a big deal but it brought some confusions time to time into Spark dev and community. I think it's time to discuss about when/which format to add a JIRA ID as a prefix for the test case name in Scala test cases.
>
> Currently we have many test case names with prefixes as below:
>
> test("SPARK-XXXXX blah blah")
> test("SPARK-XXXXX: blah blah")
> test("SPARK-XXXXX - blah blah")
> test("[SPARK-XXXXX] blah blah")
> …
>
> It is a good practice to have the JIRA ID in general because, for instance,
> it makes us put less efforts to track commit histories (or even when the files
> are totally moved), or to track related information of tests failed.
> Considering Spark's getting big, I think it's good to document.
>
> I would like to suggest this and document it in our guideline:
>
> 1. Add a prefix into a test name when a PR adds a couple of tests.
> 2. Uses "SPARK-XXXX: test name" format which is used in our code base most
>       often[1].
>
> We should make it simple and clear but closer to the actual practice. So, I would like to listen to what other people think. I would appreciate if you guys give some feedback about when to add the JIRA prefix. One alternative is that, we only add the prefix when the JIRA's type is bug.
>
> [1]
> git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
>      923
> git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
>      477
> git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
>       16
> git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
>       13
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding JIRA ID as the prefix for the test case name

Hyukjin Kwon
Let me document as below in few days:

1. For Python and Java, write a single comment that starts with JIRA ID and short description, e.g. (SPARK-XXXXX: test blah blah)
2. For R, use JIRA ID as a prefix for its test name.

assuming everybody is happy.

2019년 11월 18일 (월) 오전 11:36, Hyukjin Kwon <[hidden email]>님이 작성:
Actually there are not so many Java test cases in Spark (because Scala runs on JVM as everybody knows)[1].

Given that, I think we can avoid to put some efforts on this for now .. I don't mind if somebody wants to give a shot since it looks good anyway but to me I wouldn't spend so much time on this ..

Let me just go ahead as I suggested if you don't mind. Anyone can give a shot for Display Name - I'm willing to actively review and help.

[1]
git ls-files '*Suite.java' | wc -l
     172
git ls-files '*Suite.scala' | wc -l
    1161

2019년 11월 18일 (월) 오전 3:27, Steve Loughran <[hidden email]>님이 작성:
Test reporters do often contain some assumptions about the characters in the test methods. Historically JUnit XML reporters have never sanitised the method names so XML injection attacks have been fairly trivial. Haven't tried this for a while.

That whole JUnit XML report "standard" was actually put together in the Ant project with <Junitreport> doing the postprocessing of the JUnit run. It was driven by the team's XSL skills than any overreaching strategic goal about how to present test results of tests which could run for hours and whose output you would really want to aggregate the locks from multiple machines and processes and present in awake you can actually navigate. With hindsight, a key failing is that we chose to store the test summaries (test count, failure count...) as attributes on the root XML mode. Which is why the whole DOM gets built up in the JUnit runner. Which is why when that JUnit process crashes, you get no report at all.
 
It'd be straightforward to fix -except too much relies on that file now...important things will break. And the maven runner has historically never supported custom reporters, to let you experiment with it.

Maybe this is an opportunity to change things.

On Sun, Nov 17, 2019 at 1:42 AM Hyukjin Kwon <[hidden email]> wrote:
DisplayName looks good in general but actually here I would like first to find a existing pattern to document in guidelines given the actual existing practice we all are used to. I'm trying to be very conservative since this guidelines affect everybody.

I think it might be better to discuss separately if we want to change what we have been used to.

Also, using arbitrary names might not be actually free due to such bug like https://github.com/apache/spark/pull/25630 . It will need some more efforts to investigate as well.

On Fri, 15 Nov 2019, 20:56 Steve Loughran, <[hidden email]> wrote:
 Junit5: Display names. 

Goes all the way to the XML.


On Thu, Nov 14, 2019 at 6:13 PM Shixiong(Ryan) Zhu <[hidden email]> wrote:
Should we also add a guideline for non Scala tests? Other languages (Java, Python, R) don't support using string as a test name.

Best Regards,

Ryan


On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon <[hidden email]> wrote:

2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon <[hidden email]>님이 작성:
> In general a test should be self descriptive and I don't think we should be adding JIRA ticket references wholesale. Any action that the reader has to take to understand why a test was introduced is one too many. However in some cases the thing we are trying to test is very subtle and in that case a reference to a JIRA ticket might be useful, I do still feel that this should be a backstop and that properly documenting your tests is a much better way of dealing with this.

Yeah, the test should be self-descriptive. I don't think adding a JIRA prefix harms this point. Probably I should add this sentence in the guidelines as well.
Adding a JIRA prefix just adds one extra hint to track down details. I think it's fine to stick to this practice and make it simpler and clear to follow.

> 1. what if multiple JIRA IDs relating to the same test? we just take the very first JIRA ID?
Ideally one JIRA should describe one issue and one PR should fix one JIRA with a dedicated test.
Yeah, I think I would take the very first JIRA ID.

> 2. are we going to have a full scan of all existing tests and attach a JIRA ID to it?
Yea, let's don't do this.

> It's a nice-to-have, not super essential, just because ...
It's been asked multiple times and each committer seems having a different understanding on this.
It's not a biggie but wanted to make it clear and conclude this.

> I'd add this only when a test specifically targets a certain issue.
Yes, so this one I am not sure. From what I heard, people adds the JIRA in cases below:

- Whenever the JIRA type is a bug
- When a PR adds a couple of tests
- Only when a test specifically targets a certain issue.
- ...

Which one do we prefer and simpler to follow? 

Or I can combine as below (im gonna reword when I actually document this):
1. In general, we should add a JIRA ID as prefix of a test when a PR targets to fix a specific issue.
    In practice, it usually happens when a JIRA type is a bug or a PR adds a couple of tests.
2. Uses "SPARK-XXXX: test name" format

If we have no objection with ^, let me go with this.

2019년 11월 13일 (수) 오전 8:14, Sean Owen <[hidden email]>님이 작성:
Let's suggest "SPARK-12345:" but not go back and change a bunch of test cases.
I'd add this only when a test specifically targets a certain issue.
It's a nice-to-have, not super essential, just because in the rare
case you need to understand why a test asserts something, you can go
back and find what added it in the git history without much trouble.

On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon <[hidden email]> wrote:
>
> Hi all,
>
> Maybe it's not a big deal but it brought some confusions time to time into Spark dev and community. I think it's time to discuss about when/which format to add a JIRA ID as a prefix for the test case name in Scala test cases.
>
> Currently we have many test case names with prefixes as below:
>
> test("SPARK-XXXXX blah blah")
> test("SPARK-XXXXX: blah blah")
> test("SPARK-XXXXX - blah blah")
> test("[SPARK-XXXXX] blah blah")
> …
>
> It is a good practice to have the JIRA ID in general because, for instance,
> it makes us put less efforts to track commit histories (or even when the files
> are totally moved), or to track related information of tests failed.
> Considering Spark's getting big, I think it's good to document.
>
> I would like to suggest this and document it in our guideline:
>
> 1. Add a prefix into a test name when a PR adds a couple of tests.
> 2. Uses "SPARK-XXXX: test name" format which is used in our code base most
>       often[1].
>
> We should make it simple and clear but closer to the actual practice. So, I would like to listen to what other people think. I would appreciate if you guys give some feedback about when to add the JIRA prefix. One alternative is that, we only add the prefix when the JIRA's type is bug.
>
> [1]
> git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
>      923
> git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
>      477
> git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
>       16
> git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
>       13
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding JIRA ID as the prefix for the test case name

Hyukjin Kwon

2019년 11월 19일 (화) 오전 9:22, Hyukjin Kwon <[hidden email]>님이 작성:
Let me document as below in few days:

1. For Python and Java, write a single comment that starts with JIRA ID and short description, e.g. (SPARK-XXXXX: test blah blah)
2. For R, use JIRA ID as a prefix for its test name.

assuming everybody is happy.

2019년 11월 18일 (월) 오전 11:36, Hyukjin Kwon <[hidden email]>님이 작성:
Actually there are not so many Java test cases in Spark (because Scala runs on JVM as everybody knows)[1].

Given that, I think we can avoid to put some efforts on this for now .. I don't mind if somebody wants to give a shot since it looks good anyway but to me I wouldn't spend so much time on this ..

Let me just go ahead as I suggested if you don't mind. Anyone can give a shot for Display Name - I'm willing to actively review and help.

[1]
git ls-files '*Suite.java' | wc -l
     172
git ls-files '*Suite.scala' | wc -l
    1161

2019년 11월 18일 (월) 오전 3:27, Steve Loughran <[hidden email]>님이 작성:
Test reporters do often contain some assumptions about the characters in the test methods. Historically JUnit XML reporters have never sanitised the method names so XML injection attacks have been fairly trivial. Haven't tried this for a while.

That whole JUnit XML report "standard" was actually put together in the Ant project with <Junitreport> doing the postprocessing of the JUnit run. It was driven by the team's XSL skills than any overreaching strategic goal about how to present test results of tests which could run for hours and whose output you would really want to aggregate the locks from multiple machines and processes and present in awake you can actually navigate. With hindsight, a key failing is that we chose to store the test summaries (test count, failure count...) as attributes on the root XML mode. Which is why the whole DOM gets built up in the JUnit runner. Which is why when that JUnit process crashes, you get no report at all.
 
It'd be straightforward to fix -except too much relies on that file now...important things will break. And the maven runner has historically never supported custom reporters, to let you experiment with it.

Maybe this is an opportunity to change things.

On Sun, Nov 17, 2019 at 1:42 AM Hyukjin Kwon <[hidden email]> wrote:
DisplayName looks good in general but actually here I would like first to find a existing pattern to document in guidelines given the actual existing practice we all are used to. I'm trying to be very conservative since this guidelines affect everybody.

I think it might be better to discuss separately if we want to change what we have been used to.

Also, using arbitrary names might not be actually free due to such bug like https://github.com/apache/spark/pull/25630 . It will need some more efforts to investigate as well.

On Fri, 15 Nov 2019, 20:56 Steve Loughran, <[hidden email]> wrote:
 Junit5: Display names. 

Goes all the way to the XML.


On Thu, Nov 14, 2019 at 6:13 PM Shixiong(Ryan) Zhu <[hidden email]> wrote:
Should we also add a guideline for non Scala tests? Other languages (Java, Python, R) don't support using string as a test name.

Best Regards,

Ryan


On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon <[hidden email]> wrote:

2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon <[hidden email]>님이 작성:
> In general a test should be self descriptive and I don't think we should be adding JIRA ticket references wholesale. Any action that the reader has to take to understand why a test was introduced is one too many. However in some cases the thing we are trying to test is very subtle and in that case a reference to a JIRA ticket might be useful, I do still feel that this should be a backstop and that properly documenting your tests is a much better way of dealing with this.

Yeah, the test should be self-descriptive. I don't think adding a JIRA prefix harms this point. Probably I should add this sentence in the guidelines as well.
Adding a JIRA prefix just adds one extra hint to track down details. I think it's fine to stick to this practice and make it simpler and clear to follow.

> 1. what if multiple JIRA IDs relating to the same test? we just take the very first JIRA ID?
Ideally one JIRA should describe one issue and one PR should fix one JIRA with a dedicated test.
Yeah, I think I would take the very first JIRA ID.

> 2. are we going to have a full scan of all existing tests and attach a JIRA ID to it?
Yea, let's don't do this.

> It's a nice-to-have, not super essential, just because ...
It's been asked multiple times and each committer seems having a different understanding on this.
It's not a biggie but wanted to make it clear and conclude this.

> I'd add this only when a test specifically targets a certain issue.
Yes, so this one I am not sure. From what I heard, people adds the JIRA in cases below:

- Whenever the JIRA type is a bug
- When a PR adds a couple of tests
- Only when a test specifically targets a certain issue.
- ...

Which one do we prefer and simpler to follow? 

Or I can combine as below (im gonna reword when I actually document this):
1. In general, we should add a JIRA ID as prefix of a test when a PR targets to fix a specific issue.
    In practice, it usually happens when a JIRA type is a bug or a PR adds a couple of tests.
2. Uses "SPARK-XXXX: test name" format

If we have no objection with ^, let me go with this.

2019년 11월 13일 (수) 오전 8:14, Sean Owen <[hidden email]>님이 작성:
Let's suggest "SPARK-12345:" but not go back and change a bunch of test cases.
I'd add this only when a test specifically targets a certain issue.
It's a nice-to-have, not super essential, just because in the rare
case you need to understand why a test asserts something, you can go
back and find what added it in the git history without much trouble.

On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon <[hidden email]> wrote:
>
> Hi all,
>
> Maybe it's not a big deal but it brought some confusions time to time into Spark dev and community. I think it's time to discuss about when/which format to add a JIRA ID as a prefix for the test case name in Scala test cases.
>
> Currently we have many test case names with prefixes as below:
>
> test("SPARK-XXXXX blah blah")
> test("SPARK-XXXXX: blah blah")
> test("SPARK-XXXXX - blah blah")
> test("[SPARK-XXXXX] blah blah")
> …
>
> It is a good practice to have the JIRA ID in general because, for instance,
> it makes us put less efforts to track commit histories (or even when the files
> are totally moved), or to track related information of tests failed.
> Considering Spark's getting big, I think it's good to document.
>
> I would like to suggest this and document it in our guideline:
>
> 1. Add a prefix into a test name when a PR adds a couple of tests.
> 2. Uses "SPARK-XXXX: test name" format which is used in our code base most
>       often[1].
>
> We should make it simple and clear but closer to the actual practice. So, I would like to listen to what other people think. I would appreciate if you guys give some feedback about when to add the JIRA prefix. One alternative is that, we only add the prefix when the JIRA's type is bug.
>
> [1]
> git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
>      923
> git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
>      477
> git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
>       16
> git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
>       13
>
>
>