Run a specific PySpark test or group of tests

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Run a specific PySpark test or group of tests

Nicholas Chammas

Say you’re working on something and you want to rerun the PySpark tests, focusing on a specific test or group of tests. Is there a way to do that?

I know that you can test entire modules with this:

./python/run-tests --modules pyspark-sql

But I’m looking for something more granular, like pytest’s -k option.

On that note, does anyone else think it would be valuable to use a test runner like pytest to run our Python tests? The biggest benefits would be the use of fixtures, and more flexibility on test running and reporting. Just wondering if we’ve already considered this.

Nick

Reply | Threaded
Open this post in threaded view
|

Re: Run a specific PySpark test or group of tests

Hyukjin Kwon
For me, I would like this if this can be done with relatively small changes.
How about adding more granular options, for example, specifying or filtering smaller set of test goals in the run-tests.py script?
I think it'd be quite small change and we could roughly reach this goal if I understood correctly.


2017-08-15 3:06 GMT+09:00 Nicholas Chammas <[hidden email]>:

Say you’re working on something and you want to rerun the PySpark tests, focusing on a specific test or group of tests. Is there a way to do that?

I know that you can test entire modules with this:

./python/run-tests --modules pyspark-sql

But I’m looking for something more granular, like pytest’s -k option.

On that note, does anyone else think it would be valuable to use a test runner like pytest to run our Python tests? The biggest benefits would be the use of fixtures, and more flexibility on test running and reporting. Just wondering if we’ve already considered this.

Nick


Reply | Threaded
Open this post in threaded view
|

Re: Run a specific PySpark test or group of tests

Nicholas Chammas
Pytest does support unittest-based tests, allowing for incremental adoption. I'll see how convenient it is to use with our current test layout.

On Tue, Aug 15, 2017 at 1:03 AM Hyukjin Kwon <[hidden email]> wrote:
For me, I would like this if this can be done with relatively small changes.
How about adding more granular options, for example, specifying or filtering smaller set of test goals in the run-tests.py script?
I think it'd be quite small change and we could roughly reach this goal if I understood correctly.


2017-08-15 3:06 GMT+09:00 Nicholas Chammas <[hidden email]>:

Say you’re working on something and you want to rerun the PySpark tests, focusing on a specific test or group of tests. Is there a way to do that?

I know that you can test entire modules with this:

./python/run-tests --modules pyspark-sql

But I’m looking for something more granular, like pytest’s -k option.

On that note, does anyone else think it would be valuable to use a test runner like pytest to run our Python tests? The biggest benefits would be the use of fixtures, and more flexibility on test running and reporting. Just wondering if we’ve already considered this.

Nick


Reply | Threaded
Open this post in threaded view
|

Re: Run a specific PySpark test or group of tests

Bryan Cutler
This generally works for me to just run tests within a class or even a single test.  Not as flexible as pytest -k, which would be nice..

$ SPARK_TESTING=1 bin/pyspark pyspark.sql.tests ArrowTests

On Tue, Aug 15, 2017 at 5:49 AM, Nicholas Chammas <[hidden email]> wrote:
Pytest does support unittest-based tests, allowing for incremental adoption. I'll see how convenient it is to use with our current test layout.

On Tue, Aug 15, 2017 at 1:03 AM Hyukjin Kwon <[hidden email]> wrote:
For me, I would like this if this can be done with relatively small changes.
How about adding more granular options, for example, specifying or filtering smaller set of test goals in the run-tests.py script?
I think it'd be quite small change and we could roughly reach this goal if I understood correctly.


2017-08-15 3:06 GMT+09:00 Nicholas Chammas <[hidden email]>:

Say you’re working on something and you want to rerun the PySpark tests, focusing on a specific test or group of tests. Is there a way to do that?

I know that you can test entire modules with this:

./python/run-tests --modules pyspark-sql

But I’m looking for something more granular, like pytest’s -k option.

On that note, does anyone else think it would be valuable to use a test runner like pytest to run our Python tests? The biggest benefits would be the use of fixtures, and more flexibility on test running and reporting. Just wondering if we’ve already considered this.

Nick



Reply | Threaded
Open this post in threaded view
|

Re: Run a specific PySpark test or group of tests

Nicholas Chammas

Looks like it doesn’t take too much work to get pytest working on our code base, since it knows how to run unittest tests.

https://github.com/apache/spark/compare/master…nchammas:pytest

For example I was able to do this from that branch and it did the right thing, running only the tests with string in their name:

python [pytest *]$ ../bin/spark-submit ./pytest-run-tests.py ./pyspark/sql/tests.py -v -k string

However, looking more closely at the whole test setup, I’m hesitant to work any further on this.

My intention was to see if we could leverage pytest, tox, and other test tools that are standard in the Python ecosystem to replace some of the homegrown stuff we have. We have our own test dependency tracking code, our own breakdown of tests into module-scoped chunks, and our own machinery to parallelize test execution. It seems like it would be a lot of work to reap the benefits of using the standard tools while ensuring that we don’t lose any of the benefits our current test setup provides.

Nick

On Tue, Aug 15, 2017 at 3:26 PM Bryan Cutler cutlerb@... wrote:

This generally works for me to just run tests within a class or even a single test.  Not as flexible as pytest -k, which would be nice..

$ SPARK_TESTING=1 bin/pyspark pyspark.sql.tests ArrowTests
On Tue, Aug 15, 2017 at 5:49 AM, Nicholas Chammas <[hidden email]> wrote:
Pytest does support unittest-based tests, allowing for incremental adoption. I'll see how convenient it is to use with our current test layout.

On Tue, Aug 15, 2017 at 1:03 AM Hyukjin Kwon <[hidden email]> wrote:
For me, I would like this if this can be done with relatively small changes.
How about adding more granular options, for example, specifying or filtering smaller set of test goals in the run-tests.py script?
I think it'd be quite small change and we could roughly reach this goal if I understood correctly.


2017-08-15 3:06 GMT+09:00 Nicholas Chammas <[hidden email]>:

Say you’re working on something and you want to rerun the PySpark tests, focusing on a specific test or group of tests. Is there a way to do that?

I know that you can test entire modules with this:

./python/run-tests --modules pyspark-sql

But I’m looking for something more granular, like pytest’s -k option.

On that note, does anyone else think it would be valuable to use a test runner like pytest to run our Python tests? The biggest benefits would be the use of fixtures, and more flexibility on test running and reporting. Just wondering if we’ve already considered this.

Nick


Reply | Threaded
Open this post in threaded view
|

Re: Run a specific PySpark test or group of tests

Hyukjin Kwon
Hey all, I kind of met the goal with a minimised fix with keeping available framework and options. See 

https://github.com/apache/spark/pull/23203
https://github.com/apache/spark-website/pull/161

I know it's not perfect and other Python testing framework provide many good other features but should be good enough for now.
Thanks!


2017년 8월 17일 (목) 오전 2:38, Nicholas Chammas <[hidden email]>님이 작성:

Looks like it doesn’t take too much work to get pytest working on our code base, since it knows how to run unittest tests.

https://github.com/apache/spark/compare/master…nchammas:pytest

For example I was able to do this from that branch and it did the right thing, running only the tests with string in their name:

python [pytest *]$ ../bin/spark-submit ./pytest-run-tests.py ./pyspark/sql/tests.py -v -k string

However, looking more closely at the whole test setup, I’m hesitant to work any further on this.

My intention was to see if we could leverage pytest, tox, and other test tools that are standard in the Python ecosystem to replace some of the homegrown stuff we have. We have our own test dependency tracking code, our own breakdown of tests into module-scoped chunks, and our own machinery to parallelize test execution. It seems like it would be a lot of work to reap the benefits of using the standard tools while ensuring that we don’t lose any of the benefits our current test setup provides.

Nick

On Tue, Aug 15, 2017 at 3:26 PM Bryan Cutler cutlerb@... wrote:

This generally works for me to just run tests within a class or even a single test.  Not as flexible as pytest -k, which would be nice..

$ SPARK_TESTING=1 bin/pyspark pyspark.sql.tests ArrowTests
On Tue, Aug 15, 2017 at 5:49 AM, Nicholas Chammas <[hidden email]> wrote:
Pytest does support unittest-based tests, allowing for incremental adoption. I'll see how convenient it is to use with our current test layout.

On Tue, Aug 15, 2017 at 1:03 AM Hyukjin Kwon <[hidden email]> wrote:
For me, I would like this if this can be done with relatively small changes.
How about adding more granular options, for example, specifying or filtering smaller set of test goals in the run-tests.py script?
I think it'd be quite small change and we could roughly reach this goal if I understood correctly.


2017-08-15 3:06 GMT+09:00 Nicholas Chammas <[hidden email]>:

Say you’re working on something and you want to rerun the PySpark tests, focusing on a specific test or group of tests. Is there a way to do that?

I know that you can test entire modules with this:

./python/run-tests --modules pyspark-sql

But I’m looking for something more granular, like pytest’s -k option.

On that note, does anyone else think it would be valuable to use a test runner like pytest to run our Python tests? The biggest benefits would be the use of fixtures, and more flexibility on test running and reporting. Just wondering if we’ve already considered this.

Nick


Reply | Threaded
Open this post in threaded view
|

Re: Run a specific PySpark test or group of tests

Hyukjin Kwon
It's merged now and in developer tools page - http://spark.apache.org/developer-tools.html#individual-tests
Have some func with PySpark testing!

2018년 12월 5일 (수) 오후 4:30, Hyukjin Kwon <[hidden email]>님이 작성:
Hey all, I kind of met the goal with a minimised fix with keeping available framework and options. See 

https://github.com/apache/spark/pull/23203
https://github.com/apache/spark-website/pull/161

I know it's not perfect and other Python testing framework provide many good other features but should be good enough for now.
Thanks!


2017년 8월 17일 (목) 오전 2:38, Nicholas Chammas <[hidden email]>님이 작성:

Looks like it doesn’t take too much work to get pytest working on our code base, since it knows how to run unittest tests.

https://github.com/apache/spark/compare/master…nchammas:pytest

For example I was able to do this from that branch and it did the right thing, running only the tests with string in their name:

python [pytest *]$ ../bin/spark-submit ./pytest-run-tests.py ./pyspark/sql/tests.py -v -k string

However, looking more closely at the whole test setup, I’m hesitant to work any further on this.

My intention was to see if we could leverage pytest, tox, and other test tools that are standard in the Python ecosystem to replace some of the homegrown stuff we have. We have our own test dependency tracking code, our own breakdown of tests into module-scoped chunks, and our own machinery to parallelize test execution. It seems like it would be a lot of work to reap the benefits of using the standard tools while ensuring that we don’t lose any of the benefits our current test setup provides.

Nick

On Tue, Aug 15, 2017 at 3:26 PM Bryan Cutler cutlerb@... wrote:

This generally works for me to just run tests within a class or even a single test.  Not as flexible as pytest -k, which would be nice..

$ SPARK_TESTING=1 bin/pyspark pyspark.sql.tests ArrowTests
On Tue, Aug 15, 2017 at 5:49 AM, Nicholas Chammas <[hidden email]> wrote:
Pytest does support unittest-based tests, allowing for incremental adoption. I'll see how convenient it is to use with our current test layout.

On Tue, Aug 15, 2017 at 1:03 AM Hyukjin Kwon <[hidden email]> wrote:
For me, I would like this if this can be done with relatively small changes.
How about adding more granular options, for example, specifying or filtering smaller set of test goals in the run-tests.py script?
I think it'd be quite small change and we could roughly reach this goal if I understood correctly.


2017-08-15 3:06 GMT+09:00 Nicholas Chammas <[hidden email]>:

Say you’re working on something and you want to rerun the PySpark tests, focusing on a specific test or group of tests. Is there a way to do that?

I know that you can test entire modules with this:

./python/run-tests --modules pyspark-sql

But I’m looking for something more granular, like pytest’s -k option.

On that note, does anyone else think it would be valuable to use a test runner like pytest to run our Python tests? The biggest benefits would be the use of fixtures, and more flexibility on test running and reporting. Just wondering if we’ve already considered this.

Nick


Reply | Threaded
Open this post in threaded view
|

Re: Run a specific PySpark test or group of tests

cloud0fan
great job! thanks a lot!

On Thu, Dec 6, 2018 at 9:39 AM Hyukjin Kwon <[hidden email]> wrote:
It's merged now and in developer tools page - http://spark.apache.org/developer-tools.html#individual-tests
Have some func with PySpark testing!

2018년 12월 5일 (수) 오후 4:30, Hyukjin Kwon <[hidden email]>님이 작성:
Hey all, I kind of met the goal with a minimised fix with keeping available framework and options. See 

https://github.com/apache/spark/pull/23203
https://github.com/apache/spark-website/pull/161

I know it's not perfect and other Python testing framework provide many good other features but should be good enough for now.
Thanks!


2017년 8월 17일 (목) 오전 2:38, Nicholas Chammas <[hidden email]>님이 작성:

Looks like it doesn’t take too much work to get pytest working on our code base, since it knows how to run unittest tests.

https://github.com/apache/spark/compare/master…nchammas:pytest

For example I was able to do this from that branch and it did the right thing, running only the tests with string in their name:

python [pytest *]$ ../bin/spark-submit ./pytest-run-tests.py ./pyspark/sql/tests.py -v -k string

However, looking more closely at the whole test setup, I’m hesitant to work any further on this.

My intention was to see if we could leverage pytest, tox, and other test tools that are standard in the Python ecosystem to replace some of the homegrown stuff we have. We have our own test dependency tracking code, our own breakdown of tests into module-scoped chunks, and our own machinery to parallelize test execution. It seems like it would be a lot of work to reap the benefits of using the standard tools while ensuring that we don’t lose any of the benefits our current test setup provides.

Nick

On Tue, Aug 15, 2017 at 3:26 PM Bryan Cutler cutlerb@... wrote:

This generally works for me to just run tests within a class or even a single test.  Not as flexible as pytest -k, which would be nice..

$ SPARK_TESTING=1 bin/pyspark pyspark.sql.tests ArrowTests
On Tue, Aug 15, 2017 at 5:49 AM, Nicholas Chammas <[hidden email]> wrote:
Pytest does support unittest-based tests, allowing for incremental adoption. I'll see how convenient it is to use with our current test layout.

On Tue, Aug 15, 2017 at 1:03 AM Hyukjin Kwon <[hidden email]> wrote:
For me, I would like this if this can be done with relatively small changes.
How about adding more granular options, for example, specifying or filtering smaller set of test goals in the run-tests.py script?
I think it'd be quite small change and we could roughly reach this goal if I understood correctly.


2017-08-15 3:06 GMT+09:00 Nicholas Chammas <[hidden email]>:

Say you’re working on something and you want to rerun the PySpark tests, focusing on a specific test or group of tests. Is there a way to do that?

I know that you can test entire modules with this:

./python/run-tests --modules pyspark-sql

But I’m looking for something more granular, like pytest’s -k option.

On that note, does anyone else think it would be valuable to use a test runner like pytest to run our Python tests? The biggest benefits would be the use of fixtures, and more flexibility on test running and reporting. Just wondering if we’ve already considered this.

Nick


Reply | Threaded
Open this post in threaded view
|

Re: Run a specific PySpark test or group of tests

Xiao Li-2
Yes! This is very helpful! 

On Wed, Dec 5, 2018 at 9:21 PM Wenchen Fan <[hidden email]> wrote:
great job! thanks a lot!

On Thu, Dec 6, 2018 at 9:39 AM Hyukjin Kwon <[hidden email]> wrote:
It's merged now and in developer tools page - http://spark.apache.org/developer-tools.html#individual-tests
Have some func with PySpark testing!

2018년 12월 5일 (수) 오후 4:30, Hyukjin Kwon <[hidden email]>님이 작성:
Hey all, I kind of met the goal with a minimised fix with keeping available framework and options. See 

https://github.com/apache/spark/pull/23203
https://github.com/apache/spark-website/pull/161

I know it's not perfect and other Python testing framework provide many good other features but should be good enough for now.
Thanks!


2017년 8월 17일 (목) 오전 2:38, Nicholas Chammas <[hidden email]>님이 작성:

Looks like it doesn’t take too much work to get pytest working on our code base, since it knows how to run unittest tests.

https://github.com/apache/spark/compare/master…nchammas:pytest

For example I was able to do this from that branch and it did the right thing, running only the tests with string in their name:

python [pytest *]$ ../bin/spark-submit ./pytest-run-tests.py ./pyspark/sql/tests.py -v -k string

However, looking more closely at the whole test setup, I’m hesitant to work any further on this.

My intention was to see if we could leverage pytest, tox, and other test tools that are standard in the Python ecosystem to replace some of the homegrown stuff we have. We have our own test dependency tracking code, our own breakdown of tests into module-scoped chunks, and our own machinery to parallelize test execution. It seems like it would be a lot of work to reap the benefits of using the standard tools while ensuring that we don’t lose any of the benefits our current test setup provides.

Nick

On Tue, Aug 15, 2017 at 3:26 PM Bryan Cutler cutlerb@... wrote:

This generally works for me to just run tests within a class or even a single test.  Not as flexible as pytest -k, which would be nice..

$ SPARK_TESTING=1 bin/pyspark pyspark.sql.tests ArrowTests
On Tue, Aug 15, 2017 at 5:49 AM, Nicholas Chammas <[hidden email]> wrote:
Pytest does support unittest-based tests, allowing for incremental adoption. I'll see how convenient it is to use with our current test layout.

On Tue, Aug 15, 2017 at 1:03 AM Hyukjin Kwon <[hidden email]> wrote:
For me, I would like this if this can be done with relatively small changes.
How about adding more granular options, for example, specifying or filtering smaller set of test goals in the run-tests.py script?
I think it'd be quite small change and we could roughly reach this goal if I understood correctly.


2017-08-15 3:06 GMT+09:00 Nicholas Chammas <[hidden email]>:

Say you’re working on something and you want to rerun the PySpark tests, focusing on a specific test or group of tests. Is there a way to do that?

I know that you can test entire modules with this:

./python/run-tests --modules pyspark-sql

But I’m looking for something more granular, like pytest’s -k option.

On that note, does anyone else think it would be valuable to use a test runner like pytest to run our Python tests? The biggest benefits would be the use of fixtures, and more flexibility on test running and reporting. Just wondering if we’ve already considered this.

Nick




--
Spark+AI Summit North America 2019