[Spark on mesos] Spark framework not re-registered and lost after mesos master restarted

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Spark on mesos] Spark framework not re-registered and lost after mesos master restarted

Yu Wei

Hi guys,

I encountered a problem about spark on mesos.

I setup mesos cluster and launched spark framework on mesos successfully.

Then mesos master was killed and started again.

However, spark framework couldn't be re-registered again as mesos agent does. I also couldn't find any error logs.

And MesosClusterDispatcher is still running there.


I suspect this is spark framework issue.

What's your opinion?



Thanks,

Jared, (韦煜)
Software developer
Interested in open source software, big data, Linux

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Spark on mesos] Spark framework not re-registered and lost after mesos master restarted

Timothy Chen
I think failover isn't enabled on regular Spark job framework, since we assume jobs are more ephemeral.

It could be a good setting to add to the Spark framework to enable failover.

Tim

On Mar 30, 2017, at 10:18 AM, Yu Wei <[hidden email]> wrote:

Hi guys,

I encountered a problem about spark on mesos.

I setup mesos cluster and launched spark framework on mesos successfully.

Then mesos master was killed and started again.

However, spark framework couldn't be re-registered again as mesos agent does. I also couldn't find any error logs.

And MesosClusterDispatcher is still running there.


I suspect this is spark framework issue.

What's your opinion?



Thanks,

Jared, (韦煜)
Software developer
Interested in open source software, big data, Linux

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Spark on mesos] Spark framework not re-registered and lost after mesos master restarted

Yu Wei

Hi Tim,

I tested the scenario again with settings as below,
 
[dcos@agent spark-2.0.2-bin-hadoop2.7]$ cat conf/spark-defaults.conf
spark.deploy.recoveryMode  ZOOKEEPER
spark.deploy.zookeeper.url 192.168.111.53:2181
spark.deploy.zookeeper.dir /spark
spark.executor.memory 512M
spark.mesos.principal agent-dev-1

 
However, the case still failed. After master restarted, spark framework did not re-register.
From spark framework log, it seemed that below method in MesosClusterScheduler was not called.
override def reregistered(driver: SchedulerDriver, masterInfo: MasterInfo): Unit

Did I miss something? Any advice?


Thanks,

Jared, (韦煜)
Software developer
Interested in open source software, big data, Linux




From: Timothy Chen <[hidden email]>
Sent: Friday, March 31, 2017 5:13 AM
To: Yu Wei
Cc: [hidden email]; dev
Subject: Re: [Spark on mesos] Spark framework not re-registered and lost after mesos master restarted
 
I think failover isn't enabled on regular Spark job framework, since we assume jobs are more ephemeral.

It could be a good setting to add to the Spark framework to enable failover.

Tim

On Mar 30, 2017, at 10:18 AM, Yu Wei <[hidden email]> wrote:

Hi guys,

I encountered a problem about spark on mesos.

I setup mesos cluster and launched spark framework on mesos successfully.

Then mesos master was killed and started again.

However, spark framework couldn't be re-registered again as mesos agent does. I also couldn't find any error logs.

And MesosClusterDispatcher is still running there.


I suspect this is spark framework issue.

What's your opinion?



Thanks,

Jared, (韦煜)
Software developer
Interested in open source software, big data, Linux

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Spark on mesos] Spark framework not re-registered and lost after mesos master restarted

Timothy Chen
Hi Yu,

As mentioned earlier, currently the Spark framework will not
re-register as the failover_timeout is not set and there is no
configuration available yet.
It's only enabled in MesosClusterScheduler since it's meant to be a HA
framework.

We should add that configuration for users that want their Spark
frameworks to be able to failover in case of Master failover or
network disconnect, etc.

Tim

On Thu, Mar 30, 2017 at 8:25 PM, Yu Wei <[hidden email]> wrote:

> Hi Tim,
>
> I tested the scenario again with settings as below,
>
> [dcos@agent spark-2.0.2-bin-hadoop2.7]$ cat conf/spark-defaults.conf
> spark.deploy.recoveryMode  ZOOKEEPER
> spark.deploy.zookeeper.url 192.168.111.53:2181
> spark.deploy.zookeeper.dir /spark
> spark.executor.memory 512M
> spark.mesos.principal agent-dev-1
>
>
> However, the case still failed. After master restarted, spark framework did
> not re-register.
> From spark framework log, it seemed that below method in
> MesosClusterScheduler was not called.
> override def reregistered(driver: SchedulerDriver, masterInfo: MasterInfo):
> Unit
>
> Did I miss something? Any advice?
>
>
> Thanks,
>
> Jared, (韦煜)
> Software developer
> Interested in open source software, big data, Linux
>
>
>
> ________________________________
> From: Timothy Chen <[hidden email]>
> Sent: Friday, March 31, 2017 5:13 AM
> To: Yu Wei
> Cc: [hidden email]; dev
> Subject: Re: [Spark on mesos] Spark framework not re-registered and lost
> after mesos master restarted
>
> I think failover isn't enabled on regular Spark job framework, since we
> assume jobs are more ephemeral.
>
> It could be a good setting to add to the Spark framework to enable failover.
>
> Tim
>
> On Mar 30, 2017, at 10:18 AM, Yu Wei <[hidden email]> wrote:
>
> Hi guys,
>
> I encountered a problem about spark on mesos.
>
> I setup mesos cluster and launched spark framework on mesos successfully.
>
> Then mesos master was killed and started again.
>
> However, spark framework couldn't be re-registered again as mesos agent
> does. I also couldn't find any error logs.
>
> And MesosClusterDispatcher is still running there.
>
>
> I suspect this is spark framework issue.
>
> What's your opinion?
>
>
>
> Thanks,
>
> Jared, (韦煜)
> Software developer
> Interested in open source software, big data, Linux

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Spark on mesos] Spark framework not re-registered and lost after mesos master restarted

Yu Wei

Got that.


Thanks,

Jared, (韦煜)
Software developer
Interested in open source software, big data, Linux


From: Timothy Chen <[hidden email]>
Sent: Friday, March 31, 2017 11:33:42 AM
To: Yu Wei
Cc: dev; [hidden email]
Subject: Re: [Spark on mesos] Spark framework not re-registered and lost after mesos master restarted
 
Hi Yu,

As mentioned earlier, currently the Spark framework will not
re-register as the failover_timeout is not set and there is no
configuration available yet.
It's only enabled in MesosClusterScheduler since it's meant to be a HA
framework.

We should add that configuration for users that want their Spark
frameworks to be able to failover in case of Master failover or
network disconnect, etc.

Tim

On Thu, Mar 30, 2017 at 8:25 PM, Yu Wei <[hidden email]> wrote:
> Hi Tim,
>
> I tested the scenario again with settings as below,
>
> [dcos@agent spark-2.0.2-bin-hadoop2.7]$ cat conf/spark-defaults.conf
> spark.deploy.recoveryMode  ZOOKEEPER
> spark.deploy.zookeeper.url 192.168.111.53:2181
> spark.deploy.zookeeper.dir /spark
> spark.executor.memory 512M
> spark.mesos.principal agent-dev-1
>
>
> However, the case still failed. After master restarted, spark framework did
> not re-register.
> From spark framework log, it seemed that below method in
> MesosClusterScheduler was not called.
> override def reregistered(driver: SchedulerDriver, masterInfo: MasterInfo):
> Unit
>
> Did I miss something? Any advice?
>
>
> Thanks,
>
> Jared, (韦煜)
> Software developer
> Interested in open source software, big data, Linux
>
>
>
> ________________________________
> From: Timothy Chen <[hidden email]>
> Sent: Friday, March 31, 2017 5:13 AM
> To: Yu Wei
> Cc: [hidden email]; dev
> Subject: Re: [Spark on mesos] Spark framework not re-registered and lost
> after mesos master restarted
>
> I think failover isn't enabled on regular Spark job framework, since we
> assume jobs are more ephemeral.
>
> It could be a good setting to add to the Spark framework to enable failover.
>
> Tim
>
> On Mar 30, 2017, at 10:18 AM, Yu Wei <[hidden email]> wrote:
>
> Hi guys,
>
> I encountered a problem about spark on mesos.
>
> I setup mesos cluster and launched spark framework on mesos successfully.
>
> Then mesos master was killed and started again.
>
> However, spark framework couldn't be re-registered again as mesos agent
> does. I also couldn't find any error logs.
>
> And MesosClusterDispatcher is still running there.
>
>
> I suspect this is spark framework issue.
>
> What's your opinion?
>
>
>
> Thanks,
>
> Jared, (韦煜)
> Software developer
> Interested in open source software, big data, Linux
Loading...