Removing the Mesos fine-grained mode

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Removing the Mesos fine-grained mode

Iulian Dragoș
Hi all,

Mesos is the only cluster manager that has a fine-grained mode, but it's more often than not problematic, and it's a maintenance burden. I'd like to suggest removing it in the 2.0 release.

A few reasons:

- code/maintenance complexity. The two modes duplicate a lot of functionality (and sometimes code) that leads to subtle differences or bugs. See SPARK-10444 and also this thread and MESOS-3202
- it's not widely used (Reynold's previous thread got very few responses from people relying on it)
- similar functionality can be achieved with dynamic allocation + coarse-grained mode

I suggest that Spark 1.6 already issues a warning if it detects fine-grained use, with removal in the 2.0 release.

Thoughts?

iulian

Reply | Threaded
Open this post in threaded view
|

Re: Removing the Mesos fine-grained mode

Dean Wampler
Sounds like the right move. Simplifies things in important ways.


On Thu, Nov 19, 2015 at 5:42 AM, Iulian Dragoș <[hidden email]> wrote:
Hi all,

Mesos is the only cluster manager that has a fine-grained mode, but it's more often than not problematic, and it's a maintenance burden. I'd like to suggest removing it in the 2.0 release.

A few reasons:

- code/maintenance complexity. The two modes duplicate a lot of functionality (and sometimes code) that leads to subtle differences or bugs. See SPARK-10444 and also this thread and MESOS-3202
- it's not widely used (Reynold's previous thread got very few responses from people relying on it)
- similar functionality can be achieved with dynamic allocation + coarse-grained mode

I suggest that Spark 1.6 already issues a warning if it detects fine-grained use, with removal in the 2.0 release.

Thoughts?

iulian


Reply | Threaded
Open this post in threaded view
|

Re: Removing the Mesos fine-grained mode

Heller, Chris
In reply to this post by Iulian Dragoș
I was one that argued for fine-grain mode, and there is something I still appreciate about how fine-grain mode operates in terms of the way one would define a Mesos framework. That said, with dyn-allocation and Mesos support for both resource reservation, oversubscription and revocation, I think the direction is clear that the coarse mode is the proper way forward, and having the two code paths is just noise.

-Chris

From: Iulian Dragoș <[hidden email]>
Date: Thursday, November 19, 2015 at 6:42 AM
To: "[hidden email]" <[hidden email]>
Subject: Removing the Mesos fine-grained mode

Hi all,

Mesos is the only cluster manager that has a fine-grained mode, but it's more often than not problematic, and it's a maintenance burden. I'd like to suggest removing it in the 2.0 release.

A few reasons:

- code/maintenance complexity. The two modes duplicate a lot of functionality (and sometimes code) that leads to subtle differences or bugs. See SPARK-10444 and also this thread and MESOS-3202
- it's not widely used (Reynold's previous thread got very few responses from people relying on it)
- similar functionality can be achieved with dynamic allocation + coarse-grained mode

I suggest that Spark 1.6 already issues a warning if it detects fine-grained use, with removal in the 2.0 release.

Thoughts?

iulian

Reply | Threaded
Open this post in threaded view
|

Re: Removing the Mesos fine-grained mode

nraychaudhuri
This post has NOT been accepted by the mailing list yet.
In reply to this post by Iulian Dragoș
It makes sense. I have small confusion about the following point.

- similar functionality can be achieved with dynamic allocation + coarse-grained mode

This bit is for my understanding. Let say I have a Spark streaming job running on a coarse-grain mode with dynamic allocation.  From the docs it suggests (here I might be completely wrong) that it can scale down or scale back up to the same number of executors. So for example if I get a traffic spike and add more nodes to my mesos cluster it will not be used once the max is reached? Wouldn't fine grain mode helped here. I guess we can always set the max to high enough that it doesn't matter. Just curious.  

Nilanjan
Reply | Threaded
Open this post in threaded view
|

Re: Removing the Mesos fine-grained mode

Jo Voordeckers
In reply to this post by Heller, Chris
As a recent fine-grained mode adopter I'm now confused after reading this and other resources from spark-summit, the docs, ...  so can someone please advise me for our use-case?

We'll have 1 or 2 streaming jobs and an will run scheduled batch jobs which should take resources away from the streaming jobs and give 'em back upon completion. 

Can someone point me at the docs or a guide to set this up? 

Thanks!

- Jo Voordeckers


On Thu, Nov 19, 2015 at 5:52 AM, Heller, Chris <[hidden email]> wrote:
I was one that argued for fine-grain mode, and there is something I still appreciate about how fine-grain mode operates in terms of the way one would define a Mesos framework. That said, with dyn-allocation and Mesos support for both resource reservation, oversubscription and revocation, I think the direction is clear that the coarse mode is the proper way forward, and having the two code paths is just noise.

-Chris

From: Iulian Dragoș <[hidden email]>
Date: Thursday, November 19, 2015 at 6:42 AM
To: "[hidden email]" <[hidden email]>
Subject: Removing the Mesos fine-grained mode

Hi all,

Mesos is the only cluster manager that has a fine-grained mode, but it's more often than not problematic, and it's a maintenance burden. I'd like to suggest removing it in the 2.0 release.

A few reasons:

- code/maintenance complexity. The two modes duplicate a lot of functionality (and sometimes code) that leads to subtle differences or bugs. See SPARK-10444 and also this thread and MESOS-3202
- it's not widely used (Reynold's previous thread got very few responses from people relying on it)
- similar functionality can be achieved with dynamic allocation + coarse-grained mode

I suggest that Spark 1.6 already issues a warning if it detects fine-grained use, with removal in the 2.0 release.

Thoughts?

iulian


Reply | Threaded
Open this post in threaded view
|

Re: Removing the Mesos fine-grained mode

Iulian Dragoș
This is a good point. We should probably document this better in the migration notes. In the mean time:


Roughly, dynamic allocation lets Spark add and kill executors based on the scheduling delay. The min and max number of executors can be configured. Would this fit your use-case?

iulian


On Fri, Nov 20, 2015 at 1:55 AM, Jo Voordeckers <[hidden email]> wrote:
As a recent fine-grained mode adopter I'm now confused after reading this and other resources from spark-summit, the docs, ...  so can someone please advise me for our use-case?

We'll have 1 or 2 streaming jobs and an will run scheduled batch jobs which should take resources away from the streaming jobs and give 'em back upon completion. 

Can someone point me at the docs or a guide to set this up? 

Thanks!

- Jo Voordeckers


On Thu, Nov 19, 2015 at 5:52 AM, Heller, Chris <[hidden email]> wrote:
I was one that argued for fine-grain mode, and there is something I still appreciate about how fine-grain mode operates in terms of the way one would define a Mesos framework. That said, with dyn-allocation and Mesos support for both resource reservation, oversubscription and revocation, I think the direction is clear that the coarse mode is the proper way forward, and having the two code paths is just noise.

-Chris

From: Iulian Dragoș <[hidden email]>
Date: Thursday, November 19, 2015 at 6:42 AM
To: "[hidden email]" <[hidden email]>
Subject: Removing the Mesos fine-grained mode

Hi all,

Mesos is the only cluster manager that has a fine-grained mode, but it's more often than not problematic, and it's a maintenance burden. I'd like to suggest removing it in the 2.0 release.

A few reasons:

- code/maintenance complexity. The two modes duplicate a lot of functionality (and sometimes code) that leads to subtle differences or bugs. See SPARK-10444 and also this thread and MESOS-3202
- it's not widely used (Reynold's previous thread got very few responses from people relying on it)
- similar functionality can be achieved with dynamic allocation + coarse-grained mode

I suggest that Spark 1.6 already issues a warning if it detects fine-grained use, with removal in the 2.0 release.

Thoughts?

iulian





--

--
Iulian Dragos

------
Reactive Apps on the JVM
www.typesafe.com

Reply | Threaded
Open this post in threaded view
|

Re: Removing the Mesos fine-grained mode

Adam McElwee
I've used fine-grained mode on our mesos spark clusters until this week, mostly because it was the default. I started trying coarse-grained because of the recent chatter on the mailing list about wanting to move the mesos execution path to coarse-grained only. The odd things is, coarse-grained vs fine-grained seems to yield drastic cluster utilization metrics for any of our jobs that I've tried out this week.

If this is best as a new thread, please let me know, and I'll try not to derail this conversation. Otherwise, details below:

We monitor our spark clusters with ganglia, and historically, we maintain at least 90% cpu utilization across the cluster. Making a single configuration change to use coarse-grained execution instead of fine-grained consistently yields a cpu utilization pattern that starts around 90% at the beginning of the job, and then it slowly decreases over the next 1-1.5 hours to level out around 65% cpu utilization on the cluster. Does anyone have a clue why I'd be seeing such a negative effect of switching to coarse-grained mode? GC activity is comparable in both cases. I've tried 1.5.2, as well as the 1.6.0 preview tag that's on github.

Thanks,
-Adam

On Fri, Nov 20, 2015 at 9:53 AM, Iulian Dragoș <[hidden email]> wrote:
This is a good point. We should probably document this better in the migration notes. In the mean time:


Roughly, dynamic allocation lets Spark add and kill executors based on the scheduling delay. The min and max number of executors can be configured. Would this fit your use-case?

iulian


On Fri, Nov 20, 2015 at 1:55 AM, Jo Voordeckers <[hidden email]> wrote:
As a recent fine-grained mode adopter I'm now confused after reading this and other resources from spark-summit, the docs, ...  so can someone please advise me for our use-case?

We'll have 1 or 2 streaming jobs and an will run scheduled batch jobs which should take resources away from the streaming jobs and give 'em back upon completion. 

Can someone point me at the docs or a guide to set this up? 

Thanks!

- Jo Voordeckers


On Thu, Nov 19, 2015 at 5:52 AM, Heller, Chris <[hidden email]> wrote:
I was one that argued for fine-grain mode, and there is something I still appreciate about how fine-grain mode operates in terms of the way one would define a Mesos framework. That said, with dyn-allocation and Mesos support for both resource reservation, oversubscription and revocation, I think the direction is clear that the coarse mode is the proper way forward, and having the two code paths is just noise.

-Chris

From: Iulian Dragoș <[hidden email]>
Date: Thursday, November 19, 2015 at 6:42 AM
To: "[hidden email]" <[hidden email]>
Subject: Removing the Mesos fine-grained mode

Hi all,

Mesos is the only cluster manager that has a fine-grained mode, but it's more often than not problematic, and it's a maintenance burden. I'd like to suggest removing it in the 2.0 release.

A few reasons:

- code/maintenance complexity. The two modes duplicate a lot of functionality (and sometimes code) that leads to subtle differences or bugs. See SPARK-10444 and also this thread and MESOS-3202
- it's not widely used (Reynold's previous thread got very few responses from people relying on it)
- similar functionality can be achieved with dynamic allocation + coarse-grained mode

I suggest that Spark 1.6 already issues a warning if it detects fine-grained use, with removal in the 2.0 release.

Thoughts?

iulian





--

--
Iulian Dragos

------
Reactive Apps on the JVM
www.typesafe.com


Reply | Threaded
Open this post in threaded view
|

Re: Removing the Mesos fine-grained mode

Iulian Dragoș


On Sat, Nov 21, 2015 at 3:37 AM, Adam McElwee <[hidden email]> wrote:
I've used fine-grained mode on our mesos spark clusters until this week, mostly because it was the default. I started trying coarse-grained because of the recent chatter on the mailing list about wanting to move the mesos execution path to coarse-grained only. The odd things is, coarse-grained vs fine-grained seems to yield drastic cluster utilization metrics for any of our jobs that I've tried out this week.

If this is best as a new thread, please let me know, and I'll try not to derail this conversation. Otherwise, details below:

I think it's ok to discuss it here.
 
We monitor our spark clusters with ganglia, and historically, we maintain at least 90% cpu utilization across the cluster. Making a single configuration change to use coarse-grained execution instead of fine-grained consistently yields a cpu utilization pattern that starts around 90% at the beginning of the job, and then it slowly decreases over the next 1-1.5 hours to level out around 65% cpu utilization on the cluster. Does anyone have a clue why I'd be seeing such a negative effect of switching to coarse-grained mode? GC activity is comparable in both cases. I've tried 1.5.2, as well as the 1.6.0 preview tag that's on github.

I'm not very familiar with Ganglia, and how it computes utilization. But one thing comes to mind: did you enable dynamic allocation on coarse-grained mode?

iulian
Reply | Threaded
Open this post in threaded view
|

Re: Removing the Mesos fine-grained mode

Jerry Lam
Hi guys,

Can someone confirm if it is true that dynamic allocation on mesos "is designed to run one executor per slave with the configured amount of resources." I copied this sentence from the documentation. Does this mean there is at most 1 executor per node? Therefore,  if you have a big machine, you need to allocate a fat executor on this machine in order to fully utilize it?

Best Regards,

Sent from my iPhone

On 23 Nov, 2015, at 8:36 am, Iulian Dragoș <[hidden email]> wrote:



On Sat, Nov 21, 2015 at 3:37 AM, Adam McElwee <[hidden email]> wrote:
I've used fine-grained mode on our mesos spark clusters until this week, mostly because it was the default. I started trying coarse-grained because of the recent chatter on the mailing list about wanting to move the mesos execution path to coarse-grained only. The odd things is, coarse-grained vs fine-grained seems to yield drastic cluster utilization metrics for any of our jobs that I've tried out this week.

If this is best as a new thread, please let me know, and I'll try not to derail this conversation. Otherwise, details below:

I think it's ok to discuss it here.
 
We monitor our spark clusters with ganglia, and historically, we maintain at least 90% cpu utilization across the cluster. Making a single configuration change to use coarse-grained execution instead of fine-grained consistently yields a cpu utilization pattern that starts around 90% at the beginning of the job, and then it slowly decreases over the next 1-1.5 hours to level out around 65% cpu utilization on the cluster. Does anyone have a clue why I'd be seeing such a negative effect of switching to coarse-grained mode? GC activity is comparable in both cases. I've tried 1.5.2, as well as the 1.6.0 preview tag that's on github.

I'm not very familiar with Ganglia, and how it computes utilization. But one thing comes to mind: did you enable dynamic allocation on coarse-grained mode?

iulian
Reply | Threaded
Open this post in threaded view
|

Re: Removing the Mesos fine-grained mode

Adam McElwee
In reply to this post by Iulian Dragoș


On Mon, Nov 23, 2015 at 7:36 AM, Iulian Dragoș <[hidden email]> wrote:


On Sat, Nov 21, 2015 at 3:37 AM, Adam McElwee <[hidden email]> wrote:
I've used fine-grained mode on our mesos spark clusters until this week, mostly because it was the default. I started trying coarse-grained because of the recent chatter on the mailing list about wanting to move the mesos execution path to coarse-grained only. The odd things is, coarse-grained vs fine-grained seems to yield drastic cluster utilization metrics for any of our jobs that I've tried out this week.

If this is best as a new thread, please let me know, and I'll try not to derail this conversation. Otherwise, details below:

I think it's ok to discuss it here.
 
We monitor our spark clusters with ganglia, and historically, we maintain at least 90% cpu utilization across the cluster. Making a single configuration change to use coarse-grained execution instead of fine-grained consistently yields a cpu utilization pattern that starts around 90% at the beginning of the job, and then it slowly decreases over the next 1-1.5 hours to level out around 65% cpu utilization on the cluster. Does anyone have a clue why I'd be seeing such a negative effect of switching to coarse-grained mode? GC activity is comparable in both cases. I've tried 1.5.2, as well as the 1.6.0 preview tag that's on github.

I'm not very familiar with Ganglia, and how it computes utilization. But one thing comes to mind: did you enable dynamic allocation on coarse-grained mode?

Dynamic allocation is definitely not enabled. The only delta between runs is adding --conf "spark.mesos.coarse=true" the job submission. Ganglia is just pulling stats from the procfs, and I've never seen it report bad results. If I sample any of the 100-200 nodes in the cluster, dstat reflects the same average cpu that I'm seeing reflected in ganglia.

iulian

Reply | Threaded
Open this post in threaded view
|

Re: Removing the Mesos fine-grained mode

Andrew Or-2
@Jerry Lam

Can someone confirm if it is true that dynamic allocation on mesos "is designed to run one executor per slave with the configured amount of resources." I copied this sentence from the documentation. Does this mean there is at most 1 executor per node? Therefore,  if you have a big machine, you need to allocate a fat executor on this machine in order to fully utilize it?

Mesos inherently does not support multiple executors per slave currently. This is actually not related to dynamic allocation. There is, however, an outstanding patch to add support for multiple executors per slave. When that feature is merged, it will work well with dynamic allocation.
 

2015-11-23 9:27 GMT-08:00 Adam McElwee <[hidden email]>:


On Mon, Nov 23, 2015 at 7:36 AM, Iulian Dragoș <[hidden email]> wrote:


On Sat, Nov 21, 2015 at 3:37 AM, Adam McElwee <[hidden email]> wrote:
I've used fine-grained mode on our mesos spark clusters until this week, mostly because it was the default. I started trying coarse-grained because of the recent chatter on the mailing list about wanting to move the mesos execution path to coarse-grained only. The odd things is, coarse-grained vs fine-grained seems to yield drastic cluster utilization metrics for any of our jobs that I've tried out this week.

If this is best as a new thread, please let me know, and I'll try not to derail this conversation. Otherwise, details below:

I think it's ok to discuss it here.
 
We monitor our spark clusters with ganglia, and historically, we maintain at least 90% cpu utilization across the cluster. Making a single configuration change to use coarse-grained execution instead of fine-grained consistently yields a cpu utilization pattern that starts around 90% at the beginning of the job, and then it slowly decreases over the next 1-1.5 hours to level out around 65% cpu utilization on the cluster. Does anyone have a clue why I'd be seeing such a negative effect of switching to coarse-grained mode? GC activity is comparable in both cases. I've tried 1.5.2, as well as the 1.6.0 preview tag that's on github.

I'm not very familiar with Ganglia, and how it computes utilization. But one thing comes to mind: did you enable dynamic allocation on coarse-grained mode?

Dynamic allocation is definitely not enabled. The only delta between runs is adding --conf "spark.mesos.coarse=true" the job submission. Ganglia is just pulling stats from the procfs, and I've never seen it report bad results. If I sample any of the 100-200 nodes in the cluster, dstat reflects the same average cpu that I'm seeing reflected in ganglia.

iulian


Reply | Threaded
Open this post in threaded view
|

Re: Removing the Mesos fine-grained mode

Jerry Lam
Hi Andrew,

Thank you for confirming this. I’m referring to this because I used fine-grained mode before and it was a headache because of the memory issue. Therefore, I switched to Yarn with dynamic allocation. I was thinking if I can switch back to Mesos with coarse-grained mode + dynamic allocation but from what you explain to me, I still cannot have more than 1 executor per slave. This sounds like a deal breaker for me because if I have a slave of 100GB of RAM and a slave of 30GB, I cannot utilize the instance of 100GB of RAM fully if I specify spark.executor.memory = 20GB. The two slaves will each consume 20GB in this case even though there is 80GB left for the bigger machine. If I specify 90GB for spark.executor.memory, the only active slave is the one with 100GB. Therefore the slave with 30GB will be idled. 

Do you know the link to the JIRA that I can receive update for the feature you mention? We have intentions to use Mesos but it is proven difficult with our tight budget constraint. 

Best Regards,

Jerry


On Nov 23, 2015, at 2:41 PM, Andrew Or <[hidden email]> wrote:

@Jerry Lam

Can someone confirm if it is true that dynamic allocation on mesos "is designed to run one executor per slave with the configured amount of resources." I copied this sentence from the documentation. Does this mean there is at most 1 executor per node? Therefore,  if you have a big machine, you need to allocate a fat executor on this machine in order to fully utilize it?

Mesos inherently does not support multiple executors per slave currently. This is actually not related to dynamic allocation. There is, however, an outstanding patch to add support for multiple executors per slave. When that feature is merged, it will work well with dynamic allocation.
 

2015-11-23 9:27 GMT-08:00 Adam McElwee <[hidden email]>:


On Mon, Nov 23, 2015 at 7:36 AM, Iulian Dragoș <[hidden email]> wrote:


On Sat, Nov 21, 2015 at 3:37 AM, Adam McElwee <[hidden email]> wrote:
I've used fine-grained mode on our mesos spark clusters until this week, mostly because it was the default. I started trying coarse-grained because of the recent chatter on the mailing list about wanting to move the mesos execution path to coarse-grained only. The odd things is, coarse-grained vs fine-grained seems to yield drastic cluster utilization metrics for any of our jobs that I've tried out this week.

If this is best as a new thread, please let me know, and I'll try not to derail this conversation. Otherwise, details below:

I think it's ok to discuss it here.
 
We monitor our spark clusters with ganglia, and historically, we maintain at least 90% cpu utilization across the cluster. Making a single configuration change to use coarse-grained execution instead of fine-grained consistently yields a cpu utilization pattern that starts around 90% at the beginning of the job, and then it slowly decreases over the next 1-1.5 hours to level out around 65% cpu utilization on the cluster. Does anyone have a clue why I'd be seeing such a negative effect of switching to coarse-grained mode? GC activity is comparable in both cases. I've tried 1.5.2, as well as the 1.6.0 preview tag that's on github.

I'm not very familiar with Ganglia, and how it computes utilization. But one thing comes to mind: did you enable dynamic allocation on coarse-grained mode?

Dynamic allocation is definitely not enabled. The only delta between runs is adding --conf "spark.mesos.coarse=true" the job submission. Ganglia is just pulling stats from the procfs, and I've never seen it report bad results. If I sample any of the 100-200 nodes in the cluster, dstat reflects the same average cpu that I'm seeing reflected in ganglia.

iulian



Reply | Threaded
Open this post in threaded view
|

Re: Removing the Mesos fine-grained mode

Jerry Lam
In reply to this post by Andrew Or-2
@Andrew Or

I assume you are referring to this ticket [SPARK-5095]: https://issues.apache.org/jira/browse/SPARK-5095 
Thank you!

Best Regards,

Jerry

On Nov 23, 2015, at 2:41 PM, Andrew Or <[hidden email]> wrote:

@Jerry Lam

Can someone confirm if it is true that dynamic allocation on mesos "is designed to run one executor per slave with the configured amount of resources." I copied this sentence from the documentation. Does this mean there is at most 1 executor per node? Therefore,  if you have a big machine, you need to allocate a fat executor on this machine in order to fully utilize it?

Mesos inherently does not support multiple executors per slave currently. This is actually not related to dynamic allocation. There is, however, an outstanding patch to add support for multiple executors per slave. When that feature is merged, it will work well with dynamic allocation.
 

2015-11-23 9:27 GMT-08:00 Adam McElwee <[hidden email]>:


On Mon, Nov 23, 2015 at 7:36 AM, Iulian Dragoș <[hidden email]> wrote:


On Sat, Nov 21, 2015 at 3:37 AM, Adam McElwee <[hidden email]> wrote:
I've used fine-grained mode on our mesos spark clusters until this week, mostly because it was the default. I started trying coarse-grained because of the recent chatter on the mailing list about wanting to move the mesos execution path to coarse-grained only. The odd things is, coarse-grained vs fine-grained seems to yield drastic cluster utilization metrics for any of our jobs that I've tried out this week.

If this is best as a new thread, please let me know, and I'll try not to derail this conversation. Otherwise, details below:

I think it's ok to discuss it here.
 
We monitor our spark clusters with ganglia, and historically, we maintain at least 90% cpu utilization across the cluster. Making a single configuration change to use coarse-grained execution instead of fine-grained consistently yields a cpu utilization pattern that starts around 90% at the beginning of the job, and then it slowly decreases over the next 1-1.5 hours to level out around 65% cpu utilization on the cluster. Does anyone have a clue why I'd be seeing such a negative effect of switching to coarse-grained mode? GC activity is comparable in both cases. I've tried 1.5.2, as well as the 1.6.0 preview tag that's on github.

I'm not very familiar with Ganglia, and how it computes utilization. But one thing comes to mind: did you enable dynamic allocation on coarse-grained mode?

Dynamic allocation is definitely not enabled. The only delta between runs is adding --conf "spark.mesos.coarse=true" the job submission. Ganglia is just pulling stats from the procfs, and I've never seen it report bad results. If I sample any of the 100-200 nodes in the cluster, dstat reflects the same average cpu that I'm seeing reflected in ganglia.

iulian



Reply | Threaded
Open this post in threaded view
|

Re: Removing the Mesos fine-grained mode

Adam McElwee
In reply to this post by Adam McElwee
To eliminate any skepticism around whether cpu is a good performance metric for this workload, I did a couple comparison runs of an example job to demonstrate a more universal change in performance metrics (stage/job time) between coarse and fine-grained mode on mesos.

The workload is identical here - pulling tgz archives from s3, parsing json lines from the files and ultimately creating documents to index into solr. The tasks are not inserting into solr (just to let you know that there's no network side-effect of the map task). The runs are on the same exact hardware in ec2 (m2.4xlarge, with 68GB of ram and 45G executor memory), exact same jvm and it's not dependent on order of running the jobs, meaning I get the same results whether I run the coarse first or whether I run the fine-grained first. No other frameworks/tasks are running on the mesos cluster during the test. I see the same results whether it's a 3-node cluster, or whether it's a 200-node cluster.

With the CMS collector in fine-grained mode, the map stage takes roughly 2.9h, and coarse-grained mode takes 3.4h. Because both modes initially start out performing similarly, the total execution time gap widens as the job size grows. To put that another way, the difference is much smaller for jobs/stages < 1 hour. When I submit this job for a much larger dataset that takes 5+ hours, the difference in total stage time moves closer and closer to roughly 20-30% longer execution time.

With the G1 collector in fine-grained mode, the map stage takes roughly 2.2h, and coarse-grained mode takes 2.7h. Again, the fine and coarse-grained execution tests are on the exact same machines, exact same dataset, and only changing spark.mesos.coarse to true/false.

Let me know if there's anything else I can provide here.

Thanks,
-Adam


On Mon, Nov 23, 2015 at 11:27 AM, Adam McElwee <[hidden email]> wrote:


On Mon, Nov 23, 2015 at 7:36 AM, Iulian Dragoș <[hidden email]> wrote:


On Sat, Nov 21, 2015 at 3:37 AM, Adam McElwee <[hidden email]> wrote:
I've used fine-grained mode on our mesos spark clusters until this week, mostly because it was the default. I started trying coarse-grained because of the recent chatter on the mailing list about wanting to move the mesos execution path to coarse-grained only. The odd things is, coarse-grained vs fine-grained seems to yield drastic cluster utilization metrics for any of our jobs that I've tried out this week.

If this is best as a new thread, please let me know, and I'll try not to derail this conversation. Otherwise, details below:

I think it's ok to discuss it here.
 
We monitor our spark clusters with ganglia, and historically, we maintain at least 90% cpu utilization across the cluster. Making a single configuration change to use coarse-grained execution instead of fine-grained consistently yields a cpu utilization pattern that starts around 90% at the beginning of the job, and then it slowly decreases over the next 1-1.5 hours to level out around 65% cpu utilization on the cluster. Does anyone have a clue why I'd be seeing such a negative effect of switching to coarse-grained mode? GC activity is comparable in both cases. I've tried 1.5.2, as well as the 1.6.0 preview tag that's on github.

I'm not very familiar with Ganglia, and how it computes utilization. But one thing comes to mind: did you enable dynamic allocation on coarse-grained mode?

Dynamic allocation is definitely not enabled. The only delta between runs is adding --conf "spark.mesos.coarse=true" the job submission. Ganglia is just pulling stats from the procfs, and I've never seen it report bad results. If I sample any of the 100-200 nodes in the cluster, dstat reflects the same average cpu that I'm seeing reflected in ganglia.

iulian


Reply | Threaded
Open this post in threaded view
|

Re: Removing the Mesos fine-grained mode

Timothy Chen
Hi Adam,

Thanks for the graphs and the tests, definitely interested to dig a
bit deeper to find out what's could be the cause of this.

Do you have the spark driver logs for both runs?

Tim

On Mon, Nov 30, 2015 at 9:06 AM, Adam McElwee <[hidden email]> wrote:

> To eliminate any skepticism around whether cpu is a good performance metric
> for this workload, I did a couple comparison runs of an example job to
> demonstrate a more universal change in performance metrics (stage/job time)
> between coarse and fine-grained mode on mesos.
>
> The workload is identical here - pulling tgz archives from s3, parsing json
> lines from the files and ultimately creating documents to index into solr.
> The tasks are not inserting into solr (just to let you know that there's no
> network side-effect of the map task). The runs are on the same exact
> hardware in ec2 (m2.4xlarge, with 68GB of ram and 45G executor memory),
> exact same jvm and it's not dependent on order of running the jobs, meaning
> I get the same results whether I run the coarse first or whether I run the
> fine-grained first. No other frameworks/tasks are running on the mesos
> cluster during the test. I see the same results whether it's a 3-node
> cluster, or whether it's a 200-node cluster.
>
> With the CMS collector in fine-grained mode, the map stage takes roughly
> 2.9h, and coarse-grained mode takes 3.4h. Because both modes initially start
> out performing similarly, the total execution time gap widens as the job
> size grows. To put that another way, the difference is much smaller for
> jobs/stages < 1 hour. When I submit this job for a much larger dataset that
> takes 5+ hours, the difference in total stage time moves closer and closer
> to roughly 20-30% longer execution time.
>
> With the G1 collector in fine-grained mode, the map stage takes roughly
> 2.2h, and coarse-grained mode takes 2.7h. Again, the fine and coarse-grained
> execution tests are on the exact same machines, exact same dataset, and only
> changing spark.mesos.coarse to true/false.
>
> Let me know if there's anything else I can provide here.
>
> Thanks,
> -Adam
>
>
> On Mon, Nov 23, 2015 at 11:27 AM, Adam McElwee <[hidden email]> wrote:
>>
>>
>>
>> On Mon, Nov 23, 2015 at 7:36 AM, Iulian Dragoș
>> <[hidden email]> wrote:
>>>
>>>
>>>
>>> On Sat, Nov 21, 2015 at 3:37 AM, Adam McElwee <[hidden email]> wrote:
>>>>
>>>> I've used fine-grained mode on our mesos spark clusters until this week,
>>>> mostly because it was the default. I started trying coarse-grained because
>>>> of the recent chatter on the mailing list about wanting to move the mesos
>>>> execution path to coarse-grained only. The odd things is, coarse-grained vs
>>>> fine-grained seems to yield drastic cluster utilization metrics for any of
>>>> our jobs that I've tried out this week.
>>>>
>>>> If this is best as a new thread, please let me know, and I'll try not to
>>>> derail this conversation. Otherwise, details below:
>>>
>>>
>>> I think it's ok to discuss it here.
>>>
>>>>
>>>> We monitor our spark clusters with ganglia, and historically, we
>>>> maintain at least 90% cpu utilization across the cluster. Making a single
>>>> configuration change to use coarse-grained execution instead of fine-grained
>>>> consistently yields a cpu utilization pattern that starts around 90% at the
>>>> beginning of the job, and then it slowly decreases over the next 1-1.5 hours
>>>> to level out around 65% cpu utilization on the cluster. Does anyone have a
>>>> clue why I'd be seeing such a negative effect of switching to coarse-grained
>>>> mode? GC activity is comparable in both cases. I've tried 1.5.2, as well as
>>>> the 1.6.0 preview tag that's on github.
>>>
>>>
>>> I'm not very familiar with Ganglia, and how it computes utilization. But
>>> one thing comes to mind: did you enable dynamic allocation on coarse-grained
>>> mode?
>>
>>
>> Dynamic allocation is definitely not enabled. The only delta between runs
>> is adding --conf "spark.mesos.coarse=true" the job submission. Ganglia is
>> just pulling stats from the procfs, and I've never seen it report bad
>> results. If I sample any of the 100-200 nodes in the cluster, dstat reflects
>> the same average cpu that I'm seeing reflected in ganglia.
>>>
>>>
>>> iulian
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Removing the Mesos fine-grained mode

Iulian Dragoș
It would be good to get to the bottom of this.

Adam, could you share the Spark app that you're using to test this?

iulian

On Mon, Nov 30, 2015 at 10:10 PM, Timothy Chen <[hidden email]> wrote:
Hi Adam,

Thanks for the graphs and the tests, definitely interested to dig a
bit deeper to find out what's could be the cause of this.

Do you have the spark driver logs for both runs?

Tim

On Mon, Nov 30, 2015 at 9:06 AM, Adam McElwee <[hidden email]> wrote:
> To eliminate any skepticism around whether cpu is a good performance metric
> for this workload, I did a couple comparison runs of an example job to
> demonstrate a more universal change in performance metrics (stage/job time)
> between coarse and fine-grained mode on mesos.
>
> The workload is identical here - pulling tgz archives from s3, parsing json
> lines from the files and ultimately creating documents to index into solr.
> The tasks are not inserting into solr (just to let you know that there's no
> network side-effect of the map task). The runs are on the same exact
> hardware in ec2 (m2.4xlarge, with 68GB of ram and 45G executor memory),
> exact same jvm and it's not dependent on order of running the jobs, meaning
> I get the same results whether I run the coarse first or whether I run the
> fine-grained first. No other frameworks/tasks are running on the mesos
> cluster during the test. I see the same results whether it's a 3-node
> cluster, or whether it's a 200-node cluster.
>
> With the CMS collector in fine-grained mode, the map stage takes roughly
> 2.9h, and coarse-grained mode takes 3.4h. Because both modes initially start
> out performing similarly, the total execution time gap widens as the job
> size grows. To put that another way, the difference is much smaller for
> jobs/stages < 1 hour. When I submit this job for a much larger dataset that
> takes 5+ hours, the difference in total stage time moves closer and closer
> to roughly 20-30% longer execution time.
>
> With the G1 collector in fine-grained mode, the map stage takes roughly
> 2.2h, and coarse-grained mode takes 2.7h. Again, the fine and coarse-grained
> execution tests are on the exact same machines, exact same dataset, and only
> changing spark.mesos.coarse to true/false.
>
> Let me know if there's anything else I can provide here.
>
> Thanks,
> -Adam
>
>
> On Mon, Nov 23, 2015 at 11:27 AM, Adam McElwee <[hidden email]> wrote:
>>
>>
>>
>> On Mon, Nov 23, 2015 at 7:36 AM, Iulian Dragoș
>> <[hidden email]> wrote:
>>>
>>>
>>>
>>> On Sat, Nov 21, 2015 at 3:37 AM, Adam McElwee <[hidden email]> wrote:
>>>>
>>>> I've used fine-grained mode on our mesos spark clusters until this week,
>>>> mostly because it was the default. I started trying coarse-grained because
>>>> of the recent chatter on the mailing list about wanting to move the mesos
>>>> execution path to coarse-grained only. The odd things is, coarse-grained vs
>>>> fine-grained seems to yield drastic cluster utilization metrics for any of
>>>> our jobs that I've tried out this week.
>>>>
>>>> If this is best as a new thread, please let me know, and I'll try not to
>>>> derail this conversation. Otherwise, details below:
>>>
>>>
>>> I think it's ok to discuss it here.
>>>
>>>>
>>>> We monitor our spark clusters with ganglia, and historically, we
>>>> maintain at least 90% cpu utilization across the cluster. Making a single
>>>> configuration change to use coarse-grained execution instead of fine-grained
>>>> consistently yields a cpu utilization pattern that starts around 90% at the
>>>> beginning of the job, and then it slowly decreases over the next 1-1.5 hours
>>>> to level out around 65% cpu utilization on the cluster. Does anyone have a
>>>> clue why I'd be seeing such a negative effect of switching to coarse-grained
>>>> mode? GC activity is comparable in both cases. I've tried 1.5.2, as well as
>>>> the 1.6.0 preview tag that's on github.
>>>
>>>
>>> I'm not very familiar with Ganglia, and how it computes utilization. But
>>> one thing comes to mind: did you enable dynamic allocation on coarse-grained
>>> mode?
>>
>>
>> Dynamic allocation is definitely not enabled. The only delta between runs
>> is adding --conf "spark.mesos.coarse=true" the job submission. Ganglia is
>> just pulling stats from the procfs, and I've never seen it report bad
>> results. If I sample any of the 100-200 nodes in the cluster, dstat reflects
>> the same average cpu that I'm seeing reflected in ganglia.
>>>
>>>
>>> iulian
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]




--

--
Iulian Dragos

------
Reactive Apps on the JVM
www.typesafe.com

Reply | Threaded
Open this post in threaded view
|

Re: Removing the Mesos fine-grained mode

Adam McElwee
Sorry, I never got a chance to circle back with the master logs for this. I definitely can't share the job code, since it's used to build a pretty core dataset for my company, but let me see if I can pull some logs together in the next couple days.

On Tue, Jan 19, 2016 at 10:08 AM, Iulian Dragoș <[hidden email]> wrote:
It would be good to get to the bottom of this.

Adam, could you share the Spark app that you're using to test this?

iulian

On Mon, Nov 30, 2015 at 10:10 PM, Timothy Chen <[hidden email]> wrote:
Hi Adam,

Thanks for the graphs and the tests, definitely interested to dig a
bit deeper to find out what's could be the cause of this.

Do you have the spark driver logs for both runs?

Tim

On Mon, Nov 30, 2015 at 9:06 AM, Adam McElwee <[hidden email]> wrote:
> To eliminate any skepticism around whether cpu is a good performance metric
> for this workload, I did a couple comparison runs of an example job to
> demonstrate a more universal change in performance metrics (stage/job time)
> between coarse and fine-grained mode on mesos.
>
> The workload is identical here - pulling tgz archives from s3, parsing json
> lines from the files and ultimately creating documents to index into solr.
> The tasks are not inserting into solr (just to let you know that there's no
> network side-effect of the map task). The runs are on the same exact
> hardware in ec2 (m2.4xlarge, with 68GB of ram and 45G executor memory),
> exact same jvm and it's not dependent on order of running the jobs, meaning
> I get the same results whether I run the coarse first or whether I run the
> fine-grained first. No other frameworks/tasks are running on the mesos
> cluster during the test. I see the same results whether it's a 3-node
> cluster, or whether it's a 200-node cluster.
>
> With the CMS collector in fine-grained mode, the map stage takes roughly
> 2.9h, and coarse-grained mode takes 3.4h. Because both modes initially start
> out performing similarly, the total execution time gap widens as the job
> size grows. To put that another way, the difference is much smaller for
> jobs/stages < 1 hour. When I submit this job for a much larger dataset that
> takes 5+ hours, the difference in total stage time moves closer and closer
> to roughly 20-30% longer execution time.
>
> With the G1 collector in fine-grained mode, the map stage takes roughly
> 2.2h, and coarse-grained mode takes 2.7h. Again, the fine and coarse-grained
> execution tests are on the exact same machines, exact same dataset, and only
> changing spark.mesos.coarse to true/false.
>
> Let me know if there's anything else I can provide here.
>
> Thanks,
> -Adam
>
>
> On Mon, Nov 23, 2015 at 11:27 AM, Adam McElwee <[hidden email]> wrote:
>>
>>
>>
>> On Mon, Nov 23, 2015 at 7:36 AM, Iulian Dragoș
>> <[hidden email]> wrote:
>>>
>>>
>>>
>>> On Sat, Nov 21, 2015 at 3:37 AM, Adam McElwee <[hidden email]> wrote:
>>>>
>>>> I've used fine-grained mode on our mesos spark clusters until this week,
>>>> mostly because it was the default. I started trying coarse-grained because
>>>> of the recent chatter on the mailing list about wanting to move the mesos
>>>> execution path to coarse-grained only. The odd things is, coarse-grained vs
>>>> fine-grained seems to yield drastic cluster utilization metrics for any of
>>>> our jobs that I've tried out this week.
>>>>
>>>> If this is best as a new thread, please let me know, and I'll try not to
>>>> derail this conversation. Otherwise, details below:
>>>
>>>
>>> I think it's ok to discuss it here.
>>>
>>>>
>>>> We monitor our spark clusters with ganglia, and historically, we
>>>> maintain at least 90% cpu utilization across the cluster. Making a single
>>>> configuration change to use coarse-grained execution instead of fine-grained
>>>> consistently yields a cpu utilization pattern that starts around 90% at the
>>>> beginning of the job, and then it slowly decreases over the next 1-1.5 hours
>>>> to level out around 65% cpu utilization on the cluster. Does anyone have a
>>>> clue why I'd be seeing such a negative effect of switching to coarse-grained
>>>> mode? GC activity is comparable in both cases. I've tried 1.5.2, as well as
>>>> the 1.6.0 preview tag that's on github.
>>>
>>>
>>> I'm not very familiar with Ganglia, and how it computes utilization. But
>>> one thing comes to mind: did you enable dynamic allocation on coarse-grained
>>> mode?
>>
>>
>> Dynamic allocation is definitely not enabled. The only delta between runs
>> is adding --conf "spark.mesos.coarse=true" the job submission. Ganglia is
>> just pulling stats from the procfs, and I've never seen it report bad
>> results. If I sample any of the 100-200 nodes in the cluster, dstat reflects
>> the same average cpu that I'm seeing reflected in ganglia.
>>>
>>>
>>> iulian
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]




--

--
Iulian Dragos

------
Reactive Apps on the JVM
www.typesafe.com


Reply | Threaded
Open this post in threaded view
|

Re: Removing the Mesos fine-grained mode

Iulian Dragoș
That'd be great, thanks Adam!

On Tue, Jan 19, 2016 at 5:41 PM, Adam McElwee <[hidden email]> wrote:
Sorry, I never got a chance to circle back with the master logs for this. I definitely can't share the job code, since it's used to build a pretty core dataset for my company, but let me see if I can pull some logs together in the next couple days.

On Tue, Jan 19, 2016 at 10:08 AM, Iulian Dragoș <[hidden email]> wrote:
It would be good to get to the bottom of this.

Adam, could you share the Spark app that you're using to test this?

iulian

On Mon, Nov 30, 2015 at 10:10 PM, Timothy Chen <[hidden email]> wrote:
Hi Adam,

Thanks for the graphs and the tests, definitely interested to dig a
bit deeper to find out what's could be the cause of this.

Do you have the spark driver logs for both runs?

Tim

On Mon, Nov 30, 2015 at 9:06 AM, Adam McElwee <[hidden email]> wrote:
> To eliminate any skepticism around whether cpu is a good performance metric
> for this workload, I did a couple comparison runs of an example job to
> demonstrate a more universal change in performance metrics (stage/job time)
> between coarse and fine-grained mode on mesos.
>
> The workload is identical here - pulling tgz archives from s3, parsing json
> lines from the files and ultimately creating documents to index into solr.
> The tasks are not inserting into solr (just to let you know that there's no
> network side-effect of the map task). The runs are on the same exact
> hardware in ec2 (m2.4xlarge, with 68GB of ram and 45G executor memory),
> exact same jvm and it's not dependent on order of running the jobs, meaning
> I get the same results whether I run the coarse first or whether I run the
> fine-grained first. No other frameworks/tasks are running on the mesos
> cluster during the test. I see the same results whether it's a 3-node
> cluster, or whether it's a 200-node cluster.
>
> With the CMS collector in fine-grained mode, the map stage takes roughly
> 2.9h, and coarse-grained mode takes 3.4h. Because both modes initially start
> out performing similarly, the total execution time gap widens as the job
> size grows. To put that another way, the difference is much smaller for
> jobs/stages < 1 hour. When I submit this job for a much larger dataset that
> takes 5+ hours, the difference in total stage time moves closer and closer
> to roughly 20-30% longer execution time.
>
> With the G1 collector in fine-grained mode, the map stage takes roughly
> 2.2h, and coarse-grained mode takes 2.7h. Again, the fine and coarse-grained
> execution tests are on the exact same machines, exact same dataset, and only
> changing spark.mesos.coarse to true/false.
>
> Let me know if there's anything else I can provide here.
>
> Thanks,
> -Adam
>
>
> On Mon, Nov 23, 2015 at 11:27 AM, Adam McElwee <[hidden email]> wrote:
>>
>>
>>
>> On Mon, Nov 23, 2015 at 7:36 AM, Iulian Dragoș
>> <[hidden email]> wrote:
>>>
>>>
>>>
>>> On Sat, Nov 21, 2015 at 3:37 AM, Adam McElwee <[hidden email]> wrote:
>>>>
>>>> I've used fine-grained mode on our mesos spark clusters until this week,
>>>> mostly because it was the default. I started trying coarse-grained because
>>>> of the recent chatter on the mailing list about wanting to move the mesos
>>>> execution path to coarse-grained only. The odd things is, coarse-grained vs
>>>> fine-grained seems to yield drastic cluster utilization metrics for any of
>>>> our jobs that I've tried out this week.
>>>>
>>>> If this is best as a new thread, please let me know, and I'll try not to
>>>> derail this conversation. Otherwise, details below:
>>>
>>>
>>> I think it's ok to discuss it here.
>>>
>>>>
>>>> We monitor our spark clusters with ganglia, and historically, we
>>>> maintain at least 90% cpu utilization across the cluster. Making a single
>>>> configuration change to use coarse-grained execution instead of fine-grained
>>>> consistently yields a cpu utilization pattern that starts around 90% at the
>>>> beginning of the job, and then it slowly decreases over the next 1-1.5 hours
>>>> to level out around 65% cpu utilization on the cluster. Does anyone have a
>>>> clue why I'd be seeing such a negative effect of switching to coarse-grained
>>>> mode? GC activity is comparable in both cases. I've tried 1.5.2, as well as
>>>> the 1.6.0 preview tag that's on github.
>>>
>>>
>>> I'm not very familiar with Ganglia, and how it computes utilization. But
>>> one thing comes to mind: did you enable dynamic allocation on coarse-grained
>>> mode?
>>
>>
>> Dynamic allocation is definitely not enabled. The only delta between runs
>> is adding --conf "spark.mesos.coarse=true" the job submission. Ganglia is
>> just pulling stats from the procfs, and I've never seen it report bad
>> results. If I sample any of the 100-200 nodes in the cluster, dstat reflects
>> the same average cpu that I'm seeing reflected in ganglia.
>>>
>>>
>>> iulian
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]




--

--
Iulian Dragos

------
Reactive Apps on the JVM
www.typesafe.com





--

--
Iulian Dragos

------
Reactive Apps on the JVM
www.typesafe.com