Integration testing and Scheduler Backends

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Integration testing and Scheduler Backends

Anirudh Ramanathan-2
This is with regard to the Kubernetes Scheduler Backend and scaling the process to accept contributions. Given we're moving past upstreaming changes from our fork, and into getting new patches, I wanted to start this discussion sooner than later. This is more of a post-2.3 question - not something we're looking to solve right away.

While unit tests are handy, they're not nearly as good at giving us confidence as a successful run of our integration tests against single/multi-node k8s clusters. Currently, we have integration testing setup at https://github.com/apache-spark-on-k8s/spark-integration and it's running continuously against apache/spark:master in pepperdata-jenkins (on minikube) & k8s-testgrid (in GKE clusters). Now, the question is - how do we make integration-tests part of the PR author's workflow?

1. Keep the integration tests in the separate repo and require that contributors run them, add new tests prior to accepting their PRs as a policy. Given minikube is easy to setup and can run on a single-node, it would certainly be possible. Friction however, stems from contributors potentially having to modify the integration test code hosted in that separate repository when adding/changing functionality in the scheduler backend. Also, it's certainly going to lead to at least brief inconsistencies between the two repositories.  

2. Alternatively, we check in the integration tests alongside the actual scheduler backend code. This would work really well and is what we did in our fork. It would have to be a separate package which would take certain parameters (like cluster endpoint) and run integration test code against a local or remote cluster. It would include least some code dealing with accessing the cluster, reading results from K8s containers, test fixtures, etc.

I see value in adopting (2), given it's a clearer path for contributors and lets us keep the two pieces consistent, but it seems uncommon elsewhere. How do the other backends, i.e. YARN, Mesos and Standalone deal with accepting patches and ensuring that they do not break existing clusters? Is there automation employed for this thus far? Would love to get opinions on (1) v/s (2).

Thanks,
Anirudh


Reply | Threaded
Open this post in threaded view
|

Re: Integration testing and Scheduler Backends

Felix Cheung-2
How would (2) be uncommon elsewhere?

On Mon, Jan 8, 2018 at 10:16 PM Anirudh Ramanathan <[hidden email]> wrote:
This is with regard to the Kubernetes Scheduler Backend and scaling the process to accept contributions. Given we're moving past upstreaming changes from our fork, and into getting new patches, I wanted to start this discussion sooner than later. This is more of a post-2.3 question - not something we're looking to solve right away.

While unit tests are handy, they're not nearly as good at giving us confidence as a successful run of our integration tests against single/multi-node k8s clusters. Currently, we have integration testing setup at https://github.com/apache-spark-on-k8s/spark-integration and it's running continuously against apache/spark:master in pepperdata-jenkins (on minikube) & k8s-testgrid (in GKE clusters). Now, the question is - how do we make integration-tests part of the PR author's workflow?

1. Keep the integration tests in the separate repo and require that contributors run them, add new tests prior to accepting their PRs as a policy. Given minikube is easy to setup and can run on a single-node, it would certainly be possible. Friction however, stems from contributors potentially having to modify the integration test code hosted in that separate repository when adding/changing functionality in the scheduler backend. Also, it's certainly going to lead to at least brief inconsistencies between the two repositories.  

2. Alternatively, we check in the integration tests alongside the actual scheduler backend code. This would work really well and is what we did in our fork. It would have to be a separate package which would take certain parameters (like cluster endpoint) and run integration test code against a local or remote cluster. It would include least some code dealing with accessing the cluster, reading results from K8s containers, test fixtures, etc.

I see value in adopting (2), given it's a clearer path for contributors and lets us keep the two pieces consistent, but it seems uncommon elsewhere. How do the other backends, i.e. YARN, Mesos and Standalone deal with accepting patches and ensuring that they do not break existing clusters? Is there automation employed for this thus far? Would love to get opinions on (1) v/s (2).

Thanks,
Anirudh


Reply | Threaded
Open this post in threaded view
|

Re: Integration testing and Scheduler Backends

Anirudh Ramanathan-3
I meant uncommon in Spark, i.e. in the other scheduler backends AFAIK.   
(2) is what we use within the Kubernetes project for example, and is preferred in general.


On Mon, Jan 8, 2018 at 10:45 PM, Felix Cheung <[hidden email]> wrote:
How would (2) be uncommon elsewhere?

On Mon, Jan 8, 2018 at 10:16 PM Anirudh Ramanathan <[hidden email]> wrote:
This is with regard to the Kubernetes Scheduler Backend and scaling the process to accept contributions. Given we're moving past upstreaming changes from our fork, and into getting new patches, I wanted to start this discussion sooner than later. This is more of a post-2.3 question - not something we're looking to solve right away.

While unit tests are handy, they're not nearly as good at giving us confidence as a successful run of our integration tests against single/multi-node k8s clusters. Currently, we have integration testing setup at https://github.com/apache-spark-on-k8s/spark-integration and it's running continuously against apache/spark:master in pepperdata-jenkins (on minikube) & k8s-testgrid (in GKE clusters). Now, the question is - how do we make integration-tests part of the PR author's workflow?

1. Keep the integration tests in the separate repo and require that contributors run them, add new tests prior to accepting their PRs as a policy. Given minikube is easy to setup and can run on a single-node, it would certainly be possible. Friction however, stems from contributors potentially having to modify the integration test code hosted in that separate repository when adding/changing functionality in the scheduler backend. Also, it's certainly going to lead to at least brief inconsistencies between the two repositories.  

2. Alternatively, we check in the integration tests alongside the actual scheduler backend code. This would work really well and is what we did in our fork. It would have to be a separate package which would take certain parameters (like cluster endpoint) and run integration test code against a local or remote cluster. It would include least some code dealing with accessing the cluster, reading results from K8s containers, test fixtures, etc.

I see value in adopting (2), given it's a clearer path for contributors and lets us keep the two pieces consistent, but it seems uncommon elsewhere. How do the other backends, i.e. YARN, Mesos and Standalone deal with accepting patches and ensuring that they do not break existing clusters? Is there automation employed for this thus far? Would love to get opinions on (1) v/s (2).

Thanks,
Anirudh





--
Anirudh Ramanathan
Reply | Threaded
Open this post in threaded view
|

Re: Integration testing and Scheduler Backends

Timothy Chen
In reply to this post by Anirudh Ramanathan-2
2) will be ideal but given the velocity of main branch, what Mesos
ended up doing was simply having a separate repo since it will take
too long to merge back to main.

We ended up running it pre-release (or major PR merged) and not on
every PR, I will also comment on asking users to run it.

We did have conversations with Reynold about potentially have the
ability to run the CI on every [Mesos] tagged PR but we never got
there.

Tim

On Mon, Jan 8, 2018 at 10:16 PM, Anirudh Ramanathan
<[hidden email]> wrote:

> This is with regard to the Kubernetes Scheduler Backend and scaling the
> process to accept contributions. Given we're moving past upstreaming changes
> from our fork, and into getting new patches, I wanted to start this
> discussion sooner than later. This is more of a post-2.3 question - not
> something we're looking to solve right away.
>
> While unit tests are handy, they're not nearly as good at giving us
> confidence as a successful run of our integration tests against
> single/multi-node k8s clusters. Currently, we have integration testing setup
> at https://github.com/apache-spark-on-k8s/spark-integration and it's running
> continuously against apache/spark:master in pepperdata-jenkins (on minikube)
> & k8s-testgrid (in GKE clusters). Now, the question is - how do we make
> integration-tests part of the PR author's workflow?
>
> 1. Keep the integration tests in the separate repo and require that
> contributors run them, add new tests prior to accepting their PRs as a
> policy. Given minikube is easy to setup and can run on a single-node, it
> would certainly be possible. Friction however, stems from contributors
> potentially having to modify the integration test code hosted in that
> separate repository when adding/changing functionality in the scheduler
> backend. Also, it's certainly going to lead to at least brief
> inconsistencies between the two repositories.
>
> 2. Alternatively, we check in the integration tests alongside the actual
> scheduler backend code. This would work really well and is what we did in
> our fork. It would have to be a separate package which would take certain
> parameters (like cluster endpoint) and run integration test code against a
> local or remote cluster. It would include least some code dealing with
> accessing the cluster, reading results from K8s containers, test fixtures,
> etc.
>
> I see value in adopting (2), given it's a clearer path for contributors and
> lets us keep the two pieces consistent, but it seems uncommon elsewhere. How
> do the other backends, i.e. YARN, Mesos and Standalone deal with accepting
> patches and ensuring that they do not break existing clusters? Is there
> automation employed for this thus far? Would love to get opinions on (1) v/s
> (2).
>
> Thanks,
> Anirudh
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Integration testing and Scheduler Backends

rxin
If we can actually get our acts together and have integration tests in Jenkins (perhaps not run on every commit but can be run weekly or pre-release smoke tests), that'd be great. Then it relies less on contributors manually testing.


On Tue, Jan 9, 2018 at 8:09 AM, Timothy Chen <[hidden email]> wrote:
2) will be ideal but given the velocity of main branch, what Mesos
ended up doing was simply having a separate repo since it will take
too long to merge back to main.

We ended up running it pre-release (or major PR merged) and not on
every PR, I will also comment on asking users to run it.

We did have conversations with Reynold about potentially have the
ability to run the CI on every [Mesos] tagged PR but we never got
there.

Tim

On Mon, Jan 8, 2018 at 10:16 PM, Anirudh Ramanathan
<[hidden email]> wrote:
> This is with regard to the Kubernetes Scheduler Backend and scaling the
> process to accept contributions. Given we're moving past upstreaming changes
> from our fork, and into getting new patches, I wanted to start this
> discussion sooner than later. This is more of a post-2.3 question - not
> something we're looking to solve right away.
>
> While unit tests are handy, they're not nearly as good at giving us
> confidence as a successful run of our integration tests against
> single/multi-node k8s clusters. Currently, we have integration testing setup
> at https://github.com/apache-spark-on-k8s/spark-integration and it's running
> continuously against apache/spark:master in pepperdata-jenkins (on minikube)
> & k8s-testgrid (in GKE clusters). Now, the question is - how do we make
> integration-tests part of the PR author's workflow?
>
> 1. Keep the integration tests in the separate repo and require that
> contributors run them, add new tests prior to accepting their PRs as a
> policy. Given minikube is easy to setup and can run on a single-node, it
> would certainly be possible. Friction however, stems from contributors
> potentially having to modify the integration test code hosted in that
> separate repository when adding/changing functionality in the scheduler
> backend. Also, it's certainly going to lead to at least brief
> inconsistencies between the two repositories.
>
> 2. Alternatively, we check in the integration tests alongside the actual
> scheduler backend code. This would work really well and is what we did in
> our fork. It would have to be a separate package which would take certain
> parameters (like cluster endpoint) and run integration test code against a
> local or remote cluster. It would include least some code dealing with
> accessing the cluster, reading results from K8s containers, test fixtures,
> etc.
>
> I see value in adopting (2), given it's a clearer path for contributors and
> lets us keep the two pieces consistent, but it seems uncommon elsewhere. How
> do the other backends, i.e. YARN, Mesos and Standalone deal with accepting
> patches and ensuring that they do not break existing clusters? Is there
> automation employed for this thus far? Would love to get opinions on (1) v/s
> (2).
>
> Thanks,
> Anirudh
>
>