Quantcast

Spark job hangs when History server events are written to hdfs

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Spark job hangs when History server events are written to hdfs

Pankaj Arora
Hi, 

I am running long running application over yarn using spark and I am facing issues while using spark’s history server when the events are written to hdfs. It seems to work fine for some time and in between I see following exception.

2015-06-01 00:00:03,247 [SparkListenerBus] ERROR org.apache.spark.scheduler.LiveListenerBus - Listener EventLoggingListener threw an exception

java.lang.reflect.InvocationTargetException

        at sun.reflect.GeneratedMethodAccessor69.invoke(Unknown Source)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)

        at java.lang.reflect.Method.invoke(Unknown Source)

        at org.apache.spark.util.FileLogger$$anonfun$flush$2.apply(FileLogger.scala:203)

        at org.apache.spark.util.FileLogger$$anonfun$flush$2.apply(FileLogger.scala:203)

        at scala.Option.foreach(Option.scala:236)

        at org.apache.spark.util.FileLogger.flush(FileLogger.scala:203)

        at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:90)

        at org.apache.spark.scheduler.EventLoggingListener.onUnpersistRDD(EventLoggingListener.scala:121)

        at org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$11.apply(SparkListenerBus.scala:66)

        at org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$11.apply(SparkListenerBus.scala:66)

        at org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:83)

        at org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:81)

        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

        at org.apache.spark.scheduler.SparkListenerBus$class.foreachListener(SparkListenerBus.scala:81)

        at org.apache.spark.scheduler.SparkListenerBus$class.postToAll(SparkListenerBus.scala:66)

        at org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:32)

        at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)

        at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)

        at scala.Option.foreach(Option.scala:236)

        at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:56)

        at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)

        at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)

        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1545)

        at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46)

Caused by: java.io.IOException: All datanodes 192.168.162.54:50010 are bad. Aborting...

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1128)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:924)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)




And after that this error continue to come and spark reaches into unstable stage where no job is able to progress.

FYI.
HDFS was up and running before and after this error and on restarting application it runs fine for some hours and again same error comes.
Enough disk space was available on each data node.

Any suggestion or help would be appreciated.

Regards
Pankaj

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark job hangs when History server events are written to hdfs

Akhil Das
Can you look in the datanode logs and see whats going on? Most likely, you are hitting the ulimit on open file handles. 

Thanks
Best Regards

On Wed, Jul 8, 2015 at 10:55 AM, Pankaj Arora <[hidden email]> wrote:
Hi, 

I am running long running application over yarn using spark and I am facing issues while using spark’s history server when the events are written to hdfs. It seems to work fine for some time and in between I see following exception.

2015-06-01 00:00:03,247 [SparkListenerBus] ERROR org.apache.spark.scheduler.LiveListenerBus - Listener EventLoggingListener threw an exception

java.lang.reflect.InvocationTargetException

        at sun.reflect.GeneratedMethodAccessor69.invoke(Unknown Source)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)

        at java.lang.reflect.Method.invoke(Unknown Source)

        at org.apache.spark.util.FileLogger$$anonfun$flush$2.apply(FileLogger.scala:203)

        at org.apache.spark.util.FileLogger$$anonfun$flush$2.apply(FileLogger.scala:203)

        at scala.Option.foreach(Option.scala:236)

        at org.apache.spark.util.FileLogger.flush(FileLogger.scala:203)

        at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:90)

        at org.apache.spark.scheduler.EventLoggingListener.onUnpersistRDD(EventLoggingListener.scala:121)

        at org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$11.apply(SparkListenerBus.scala:66)

        at org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$11.apply(SparkListenerBus.scala:66)

        at org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:83)

        at org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:81)

        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

        at org.apache.spark.scheduler.SparkListenerBus$class.foreachListener(SparkListenerBus.scala:81)

        at org.apache.spark.scheduler.SparkListenerBus$class.postToAll(SparkListenerBus.scala:66)

        at org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:32)

        at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)

        at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)

        at scala.Option.foreach(Option.scala:236)

        at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:56)

        at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)

        at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)

        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1545)

        at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46)

Caused by: java.io.IOException: All datanodes 192.168.162.54:50010 are bad. Aborting...

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1128)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:924)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)




And after that this error continue to come and spark reaches into unstable stage where no job is able to progress.

FYI.
HDFS was up and running before and after this error and on restarting application it runs fine for some hours and again same error comes.
Enough disk space was available on each data node.

Any suggestion or help would be appreciated.

Regards
Pankaj


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark job hangs when History server events are written to hdfs

Archit Thakur
As such we do not open any files by ourselves. EventLoggingListener opens the file to write down the events in json format for history server. But it uses the same writer(PrintWriter object) and eventually the same output stream (which boils down to DFSOutputStream for us). It seems DFSOutputStream maintains an instance variable hasError in case of an error and even if hdfs comes back up, it throws exceptions. 

On Wed, Jul 8, 2015 at 1:10 PM, Akhil Das <[hidden email]> wrote:
Can you look in the datanode logs and see whats going on? Most likely, you are hitting the ulimit on open file handles. 

Thanks
Best Regards

On Wed, Jul 8, 2015 at 10:55 AM, Pankaj Arora <[hidden email]> wrote:
Hi, 

I am running long running application over yarn using spark and I am facing issues while using spark’s history server when the events are written to hdfs. It seems to work fine for some time and in between I see following exception.

2015-06-01 00:00:03,247 [SparkListenerBus] ERROR org.apache.spark.scheduler.LiveListenerBus - Listener EventLoggingListener threw an exception

java.lang.reflect.InvocationTargetException

        at sun.reflect.GeneratedMethodAccessor69.invoke(Unknown Source)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)

        at java.lang.reflect.Method.invoke(Unknown Source)

        at org.apache.spark.util.FileLogger$$anonfun$flush$2.apply(FileLogger.scala:203)

        at org.apache.spark.util.FileLogger$$anonfun$flush$2.apply(FileLogger.scala:203)

        at scala.Option.foreach(Option.scala:236)

        at org.apache.spark.util.FileLogger.flush(FileLogger.scala:203)

        at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:90)

        at org.apache.spark.scheduler.EventLoggingListener.onUnpersistRDD(EventLoggingListener.scala:121)

        at org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$11.apply(SparkListenerBus.scala:66)

        at org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$11.apply(SparkListenerBus.scala:66)

        at org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:83)

        at org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:81)

        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

        at org.apache.spark.scheduler.SparkListenerBus$class.foreachListener(SparkListenerBus.scala:81)

        at org.apache.spark.scheduler.SparkListenerBus$class.postToAll(SparkListenerBus.scala:66)

        at org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:32)

        at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)

        at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)

        at scala.Option.foreach(Option.scala:236)

        at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:56)

        at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)

        at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)

        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1545)

        at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46)

Caused by: java.io.IOException: All datanodes 192.168.162.54:50010 are bad. Aborting...

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1128)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:924)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)




And after that this error continue to come and spark reaches into unstable stage where no job is able to progress.

FYI.
HDFS was up and running before and after this error and on restarting application it runs fine for some hours and again same error comes.
Enough disk space was available on each data node.

Any suggestion or help would be appreciated.

Regards
Pankaj



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark job hangs when History server events are written to hdfs

Pankaj Arora
I will reproduce this and get the datanode logs but I remember there was some exception in data node logs.
Also this is reproducible if you restart hdfs in between and this doesn’t recover after hdfs comes back again. Shouldn’t there be a way to recover from these type of errors.

Thanks and Regards
Pankaj

From: Archit Thakur <[hidden email]>
Date: Wednesday, 8 July 2015 4:36 pm
To: Akhil Das <[hidden email]>
Cc: Pankaj Arora <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: Spark job hangs when History server events are written to hdfs

As such we do not open any files by ourselves. EventLoggingListener opens the file to write down the events in json format for history server. But it uses the same writer(PrintWriter object) and eventually the same output stream (which boils down to DFSOutputStream for us). It seems DFSOutputStream maintains an instance variable hasError in case of an error and even if hdfs comes back up, it throws exceptions. 

On Wed, Jul 8, 2015 at 1:10 PM, Akhil Das <[hidden email]> wrote:
Can you look in the datanode logs and see whats going on? Most likely, you are hitting the ulimit on open file handles. 

Thanks
Best Regards

On Wed, Jul 8, 2015 at 10:55 AM, Pankaj Arora <[hidden email]> wrote:
Hi, 

I am running long running application over yarn using spark and I am facing issues while using spark’s history server when the events are written to hdfs. It seems to work fine for some time and in between I see following exception.

2015-06-01 00:00:03,247 [SparkListenerBus] ERROR org.apache.spark.scheduler.LiveListenerBus - Listener EventLoggingListener threw an exception

java.lang.reflect.InvocationTargetException

        at sun.reflect.GeneratedMethodAccessor69.invoke(Unknown Source)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)

        at java.lang.reflect.Method.invoke(Unknown Source)

        at org.apache.spark.util.FileLogger$$anonfun$flush$2.apply(FileLogger.scala:203)

        at org.apache.spark.util.FileLogger$$anonfun$flush$2.apply(FileLogger.scala:203)

        at scala.Option.foreach(Option.scala:236)

        at org.apache.spark.util.FileLogger.flush(FileLogger.scala:203)

        at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:90)

        at org.apache.spark.scheduler.EventLoggingListener.onUnpersistRDD(EventLoggingListener.scala:121)

        at org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$11.apply(SparkListenerBus.scala:66)

        at org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$11.apply(SparkListenerBus.scala:66)

        at org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:83)

        at org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:81)

        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

        at org.apache.spark.scheduler.SparkListenerBus$class.foreachListener(SparkListenerBus.scala:81)

        at org.apache.spark.scheduler.SparkListenerBus$class.postToAll(SparkListenerBus.scala:66)

        at org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:32)

        at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)

        at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)

        at scala.Option.foreach(Option.scala:236)

        at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:56)

        at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)

        at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)

        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1545)

        at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46)

Caused by: java.io.IOException: All datanodes 192.168.162.54:50010 are bad. Aborting...

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1128)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:924)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)




And after that this error continue to come and spark reaches into unstable stage where no job is able to progress.

FYI.
HDFS was up and running before and after this error and on restarting application it runs fine for some hours and again same error comes.
Enough disk space was available on each data node.

Any suggestion or help would be appreciated.

Regards
Pankaj



Vel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark job hangs when History server events are written to hdfs

Vel
This post has NOT been accepted by the mailing list yet.
Hi I am getting this same exceptions,
what was the fix made,
Thanks in advance.
Loading...