Method for gracefully terminating a driver on a standalone master in Spark 2.1+

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Method for gracefully terminating a driver on a standalone master in Spark 2.1+

Michael Allman-2
Hello,

In performing our prod cluster upgrade, we've noticed that the behavior for killing a driver is more aggressive. Whereas pre-2.1 the driver runner would only call `Process.destroy`, in 2.1+ it now calls `Process.destroyForcibly` (on Java 8) if the previous `destroy` call does not return within 10 seconds. (Commit in which this change was made: https://github.com/apache/spark/commit/1c9a386c6b6812a3931f3fb0004249894a01f657)

In light of this change, is there still a way to gracefully kill a spark driver running on a standalone cluster that takes longer than 10 seconds to die gracefully? For example, if we have a streaming job with 1 minute batches we want a much longer timeout before force-killing a driver. We'd rather do a `kill -9` ourselves—if necessary—so we can control the termination procedure.

Thank you.

Michael
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Method for gracefully terminating a driver on a standalone master in Spark 2.1+

Michael Allman-2
As I cannot find a way to gracefully kill an app which takes longer than 10 seconds to shut down, I have reported this issue as a bug:


Michael

On May 4, 2017, at 4:15 PM, Michael Allman <[hidden email]> wrote:

Hello,

In performing our prod cluster upgrade, we've noticed that the behavior for killing a driver is more aggressive. Whereas pre-2.1 the driver runner would only call `Process.destroy`, in 2.1+ it now calls `Process.destroyForcibly` (on Java 8) if the previous `destroy` call does not return within 10 seconds. (Commit in which this change was made: https://github.com/apache/spark/commit/1c9a386c6b6812a3931f3fb0004249894a01f657)

In light of this change, is there still a way to gracefully kill a spark driver running on a standalone cluster that takes longer than 10 seconds to die gracefully? For example, if we have a streaming job with 1 minute batches we want a much longer timeout before force-killing a driver. We'd rather do a `kill -9` ourselves—if necessary—so we can control the termination procedure.

Thank you.

Michael

Loading...