python tests: any reason for a huge tests.py?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

python tests: any reason for a huge tests.py?

Imran Rashid-4
Hi,

another question from looking more at python recently.  Is there any reason we've got a ton of tests in one humongous tests.py file, rather than breaking it out into smaller files?

Having one huge file doesn't seem great for code organization, and it also makes the test parallelization in run-tests.py not work as well.  On my laptop, tests.py takes 150s, and the next longest test file takes only 20s.

can we at least try to put new tests into smaller files?

thanks,
Imran
Reply | Threaded
Open this post in threaded view
|

Re: python tests: any reason for a huge tests.py?

rxin
We should break it. 

On Fri, Aug 24, 2018 at 9:53 AM Imran Rashid <[hidden email]> wrote:
Hi,

another question from looking more at python recently.  Is there any reason we've got a ton of tests in one humongous tests.py file, rather than breaking it out into smaller files?

Having one huge file doesn't seem great for code organization, and it also makes the test parallelization in run-tests.py not work as well.  On my laptop, tests.py takes 150s, and the next longest test file takes only 20s.

can we at least try to put new tests into smaller files?

thanks,
Imran
Reply | Threaded
Open this post in threaded view
|

Re: python tests: any reason for a huge tests.py?

Imran Rashid-4

On Fri, Aug 24, 2018 at 11:57 AM Reynold Xin <[hidden email]> wrote:
We should break it. 

On Fri, Aug 24, 2018 at 9:53 AM Imran Rashid <[hidden email]> wrote:
Hi,

another question from looking more at python recently.  Is there any reason we've got a ton of tests in one humongous tests.py file, rather than breaking it out into smaller files?

Having one huge file doesn't seem great for code organization, and it also makes the test parallelization in run-tests.py not work as well.  On my laptop, tests.py takes 150s, and the next longest test file takes only 20s.

can we at least try to put new tests into smaller files?

thanks,
Imran
Reply | Threaded
Open this post in threaded view
|

Re: python tests: any reason for a huge tests.py?

Imran Rashid-4
So I've had some offline discussion around this, so I'd like to clarify.  SPARK-25344 maybe some non-trivial work to do, as its significant refactoring.

But can we agree on an *immediate* first step: all new python tests should go into their own files?  is there some reason to not do that right away?

I understand that in some case, you'll want to add a test case that really is related to an existing test already in those giant files, and it makes sense for you to keep them close.   Its fine to decide on a case-by-case basis whether we should do the relevant refactoring for that relevant bit at the same or just put it in the same file.  But we should still have this *goal* in mind, so you should do it in the cases where its really independent cases.

That avoid us making the problem worse till we get to SPARK-25344, and furthermore it will allow work on SPARK-25344 to eventually proceed without never ending merge conflicts with other changes that are also adding new tests.

On Wed, Sep 5, 2018 at 1:27 PM Imran Rashid <[hidden email]> wrote:

On Fri, Aug 24, 2018 at 11:57 AM Reynold Xin <[hidden email]> wrote:
We should break it. 

On Fri, Aug 24, 2018 at 9:53 AM Imran Rashid <[hidden email]> wrote:
Hi,

another question from looking more at python recently.  Is there any reason we've got a ton of tests in one humongous tests.py file, rather than breaking it out into smaller files?

Having one huge file doesn't seem great for code organization, and it also makes the test parallelization in run-tests.py not work as well.  On my laptop, tests.py takes 150s, and the next longest test file takes only 20s.

can we at least try to put new tests into smaller files?

thanks,
Imran
Reply | Threaded
Open this post in threaded view
|

Re: python tests: any reason for a huge tests.py?

Bryan Cutler
Hi Imran,

I agree it would be good to split up the tests, but there might be a couple things to discuss first. Right now we have a single "test.py" for each subpackage. I think it makes sense to roughly have a test file for most modules, e.g. "test_rdd.py", but it might not always be clear cut and there could be other ways to split them up.  Also, should we put the test files in the same directory as source or a subdirectory named "tests." My preference is for a subdirectory.  As for putting new tests into their own files right away, it seems better to me to keep them with related tests for now and separate as it's own task to avoid fragmenting the test suites. If it's done incrementally, I don't think merge conflicts will cause a problem. Let be summarize this in SPARK-25344.

Thanks,
Bryan

On Wed, Sep 12, 2018 at 10:48 AM Imran Rashid <[hidden email]> wrote:
So I've had some offline discussion around this, so I'd like to clarify.  SPARK-25344 maybe some non-trivial work to do, as its significant refactoring.

But can we agree on an *immediate* first step: all new python tests should go into their own files?  is there some reason to not do that right away?

I understand that in some case, you'll want to add a test case that really is related to an existing test already in those giant files, and it makes sense for you to keep them close.   Its fine to decide on a case-by-case basis whether we should do the relevant refactoring for that relevant bit at the same or just put it in the same file.  But we should still have this *goal* in mind, so you should do it in the cases where its really independent cases.

That avoid us making the problem worse till we get to SPARK-25344, and furthermore it will allow work on SPARK-25344 to eventually proceed without never ending merge conflicts with other changes that are also adding new tests.

On Wed, Sep 5, 2018 at 1:27 PM Imran Rashid <[hidden email]> wrote:

On Fri, Aug 24, 2018 at 11:57 AM Reynold Xin <[hidden email]> wrote:
We should break it. 

On Fri, Aug 24, 2018 at 9:53 AM Imran Rashid <[hidden email]> wrote:
Hi,

another question from looking more at python recently.  Is there any reason we've got a ton of tests in one humongous tests.py file, rather than breaking it out into smaller files?

Having one huge file doesn't seem great for code organization, and it also makes the test parallelization in run-tests.py not work as well.  On my laptop, tests.py takes 150s, and the next longest test file takes only 20s.

can we at least try to put new tests into smaller files?

thanks,
Imran