Supporting Apache Aurora as a cluster manager

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Supporting Apache Aurora as a cluster manager

karthik padmanabhan
Hi Spark Devs,

We are using Aurora (http://aurora.apache.org/) as our mesos framework for running stateless services. We would like to use Aurora to deploy big data and batch workloads as well. And for this we have forked Spark and implement the ExternalClusterManager trait.

The reason for doing this and not running Spark on Mesos is to leverage the existing roles and quotas provided by Aurora for admission control and also leverage Aurora features such as priority and preemption. Additionally we would like Aurora to be only deploy/orchestration system that our users should interact with.  

We have a working POC where Spark is launching jobs through as the ClusterManager. Is this something that can be merged upstream ? If so I can create a design document and create an associated jira ticket.

Thanks
Karthik
Reply | Threaded
Open this post in threaded view
|

Re: Supporting Apache Aurora as a cluster manager

Mark Hamstra
While it may be worth creating the design doc and JIRA ticket so that we at least have a better idea and a record of what you are talking about, I kind of doubt that we are going to want to merge this into the Spark codebase. That's not because of anything specific to this Aurora effort, but rather because scheduler implementations in general are not going in the preferred direction. There is already some regret that the YARN scheduler wasn't implemented by means of a scheduler plug-in API, and there is likely to be more regret if we continue to go forward with the spark-on-kubernetes SPIP in its present form. I'd guess that we are likely to merge code associated with that SPIP just because Kubernetes has become such an important resource scheduler, but such a merge wouldn't be without some misgivings. That is because we just can't get into the position of having more and more scheduler implementations in the Spark code, and more and more maintenance overhead to keep up with the idiosyncrasies of all the scheduler implementations. We've really got to get to the kind of plug-in architecture discussed in SPARK-19700 so that scheduler implementations can be done outside of the Spark codebase, release schedule, etc.

My opinion on the subject isn't dispositive on its own, of course, but that is how I'm seeing things right now. 

On Sun, Sep 10, 2017 at 8:27 PM, karthik padmanabhan <[hidden email]> wrote:
Hi Spark Devs,

We are using Aurora (http://aurora.apache.org/) as our mesos framework for running stateless services. We would like to use Aurora to deploy big data and batch workloads as well. And for this we have forked Spark and implement the ExternalClusterManager trait.

The reason for doing this and not running Spark on Mesos is to leverage the existing roles and quotas provided by Aurora for admission control and also leverage Aurora features such as priority and preemption. Additionally we would like Aurora to be only deploy/orchestration system that our users should interact with.  

We have a working POC where Spark is launching jobs through as the ClusterManager. Is this something that can be merged upstream ? If so I can create a design document and create an associated jira ticket.

Thanks
Karthik

Reply | Threaded
Open this post in threaded view
|

Re: Supporting Apache Aurora as a cluster manager

karthik padmanabhan
Hi Mark,

Thanks for getting back. I think you raise a very valid point about getting into a plug-in base architecture instead of supporting the idiosyncrasies of different schedulers. Yeah let me write a design doc so that it will at least be another data point for how we think about the plug-in architecture discussed in SPARK-19700.

Thanks
Karthik

On Sun, Sep 10, 2017 at 11:02 PM, Mark Hamstra <[hidden email]> wrote:
While it may be worth creating the design doc and JIRA ticket so that we at least have a better idea and a record of what you are talking about, I kind of doubt that we are going to want to merge this into the Spark codebase. That's not because of anything specific to this Aurora effort, but rather because scheduler implementations in general are not going in the preferred direction. There is already some regret that the YARN scheduler wasn't implemented by means of a scheduler plug-in API, and there is likely to be more regret if we continue to go forward with the spark-on-kubernetes SPIP in its present form. I'd guess that we are likely to merge code associated with that SPIP just because Kubernetes has become such an important resource scheduler, but such a merge wouldn't be without some misgivings. That is because we just can't get into the position of having more and more scheduler implementations in the Spark code, and more and more maintenance overhead to keep up with the idiosyncrasies of all the scheduler implementations. We've really got to get to the kind of plug-in architecture discussed in SPARK-19700 so that scheduler implementations can be done outside of the Spark codebase, release schedule, etc.

My opinion on the subject isn't dispositive on its own, of course, but that is how I'm seeing things right now. 

On Sun, Sep 10, 2017 at 8:27 PM, karthik padmanabhan <[hidden email]> wrote:
Hi Spark Devs,

We are using Aurora (http://aurora.apache.org/) as our mesos framework for running stateless services. We would like to use Aurora to deploy big data and batch workloads as well. And for this we have forked Spark and implement the ExternalClusterManager trait.

The reason for doing this and not running Spark on Mesos is to leverage the existing roles and quotas provided by Aurora for admission control and also leverage Aurora features such as priority and preemption. Additionally we would like Aurora to be only deploy/orchestration system that our users should interact with.  

We have a working POC where Spark is launching jobs through as the ClusterManager. Is this something that can be merged upstream ? If so I can create a design document and create an associated jira ticket.

Thanks
Karthik