a spark job hangs 20 hours and it shows 2 tasks not finished in stage page but all tasks shows finished or failed in task page

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

a spark job hangs 20 hours and it shows 2 tasks not finished in stage page but all tasks shows finished or failed in task page

zhangliyun
Hi all:
  i want to ask a question, it seems that my spark job hangs as 20+ hours.
 in the spark history log, it shows 8999 completed task while 2 not finished.  but when i go to the tasks page.  i did not find any running tasks.  All tasks are failed or Successful.   I guess it seems that all tasks are finished while driver is not notified.  i want to ask how to check or verify my guess. 



 



 



---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Screen Shot 2020-05-07 at 6.35.43 AM.png (138K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: a spark job hangs 20 hours and it shows 2 tasks not finished in stage page but all tasks shows finished or failed in task page

ML Books
You can click on the description of stage. It redirects to the tasks page. Please share that. 


On Thu, May 7, 2020, 7:44 AM zhangliyun <[hidden email]> wrote:
Hi all:
  i want to ask a question, it seems that my spark job hangs as 20+ hours.
 in the spark history log, it shows 8999 completed task while 2 not finished.  but when i go to the tasks page.  i did not find any running tasks.  All tasks are failed or Successful.   I guess it seems that all tasks are finished while driver is not notified.  i want to ask how to check or verify my guess. 



 



 


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re:Re: a spark job hangs 20 hours and it shows 2 tasks not finished in stage page but all tasks shows finished or failed in task page

zhangliyun



the stage shows it is running, no description , see attached picture.




在 2020-05-07 10:59:45,"ML Books" <[hidden email]> 写道:

You can click on the description of stage. It redirects to the tasks page. Please share that. 


On Thu, May 7, 2020, 7:44 AM zhangliyun <[hidden email]> wrote:
Hi all:
  i want to ask a question, it seems that my spark job hangs as 20+ hours.
 in the spark history log, it shows 8999 completed task while 2 not finished.  but when i go to the tasks page.  i did not find any running tasks.  All tasks are failed or Successful.   I guess it seems that all tasks are finished while driver is not notified.  i want to ask how to check or verify my guess. 



 



 


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


 



---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Screen Shot 2020-05-07 at 6.35.43 AM.png (138K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: a spark job hangs 20 hours and it shows 2 tasks not finished in stage page but all tasks shows finished or failed in task page

angers.zhu
hi  zhangliyun,

Loss event in ListenerBus?

I meet this case when event queue’s capacity is too small.


On 05/7/2020 11:50[hidden email] wrote:



the stage shows it is running, no description , see attached picture.




在 2020-05-07 10:59:45,"ML Books" <[hidden email]> 写道:

You can click on the description of stage. It redirects to the tasks page. Please share that. 


On Thu, May 7, 2020, 7:44 AM zhangliyun <[hidden email]> wrote:
Hi all:
  i want to ask a question, it seems that my spark job hangs as 20+ hours.
 in the spark history log, it shows 8999 completed task while 2 not finished.  but when i go to the tasks page.  i did not find any running tasks.  All tasks are failed or Successful.   I guess it seems that all tasks are finished while driver is not notified.  i want to ask how to check or verify my guess. 



 



 


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


 

Reply | Threaded
Open this post in threaded view
|

Re: a spark job hangs 20 hours and it shows 2 tasks not finished in stage page but all tasks shows finished or failed in task page

ML Books
I think this question might sound silly. How to set yarn queue to each stage??
Whenever we set yarn queue to the application level. It doesn't reflect in stages page. It always shows default. 




On Thu, May 7, 2020, 9:23 AM angers.zhu <[hidden email]> wrote:
hi  zhangliyun,

Loss event in ListenerBus?

I meet this case when event queue’s capacity is too small.


On 05/7/2020 11:50[hidden email] wrote:



the stage shows it is running, no description , see attached picture.




在 2020-05-07 10:59:45,"ML Books" <[hidden email]> 写道:

You can click on the description of stage. It redirects to the tasks page. Please share that. 


On Thu, May 7, 2020, 7:44 AM zhangliyun <[hidden email]> wrote:
Hi all:
  i want to ask a question, it seems that my spark job hangs as 20+ hours.
 in the spark history log, it shows 8999 completed task while 2 not finished.  but when i go to the tasks page.  i did not find any running tasks.  All tasks are failed or Successful.   I guess it seems that all tasks are finished while driver is not notified.  i want to ask how to check or verify my guess. 



 



 


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


 


Screen Shot 2020-05-07 at 6.35.43 AM.png (138K) Download Attachment
Screen Shot 2020-05-07 at 6.35.43 AM.png (138K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: a spark job hangs 20 hours and it shows 2 tasks not finished in stage page but all tasks shows finished or failed in task page

sandeep_katta
In reply to this post by angers.zhu
Yes we too faced this problem, just try to grep ‘ Dropping event from queue’ in the driver logs

If you find this then it is problem with the event queue capacity, just try to increase spark.scheduler.listenerbus.eventqueue.capacity

On Thu, 7 May 2020 at 9:24 AM, angers.zhu <[hidden email]> wrote:
hi  zhangliyun,

Loss event in ListenerBus?

I meet this case when event queue’s capacity is too small.

签名由 网易邮箱大师 定制

On 05/7/2020 11:50[hidden email] wrote:



the stage shows it is running, no description , see attached picture.




在 2020-05-07 10:59:45,"ML Books" <[hidden email]> 写道:

You can click on the description of stage. It redirects to the tasks page. Please share that. 


On Thu, May 7, 2020, 7:44 AM zhangliyun <[hidden email]> wrote:
Hi all:
  i want to ask a question, it seems that my spark job hangs as 20+ hours.
 in the spark history log, it shows 8999 completed task while 2 not finished.  but when i go to the tasks page.  i did not find any running tasks.  All tasks are failed or Successful.   I guess it seems that all tasks are finished while driver is not notified.  i want to ask how to check or verify my guess. 



 



 


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


 

Reply | Threaded
Open this post in threaded view
|

Re:Re: a spark job hangs 20 hours and it shows 2 tasks not finished in stage page but all tasks shows finished or failed in task page

zhangliyun
thanks all, I grep 

grep "Dropping event from queue" bslv3.stdout 

2020-05-05 16:00:39,084 ERROR AsyncEventQueue: Dropping event from queue executorManagement. This likely means one of the listeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler.


I saw the default value for spark.scheduler.listenerbus.eventqueue.capacity is  10000, so what I need to do is to double it?
like 20000. is there any bad effect if enlarge it to 20000?

Best Regards
Kelly Zhang

在 2020-05-07 12:04:06,"Sandeep Katta" <[hidden email]> 写道:

Yes we too faced this problem, just try to grep ‘ Dropping event from queue’ in the driver logs

If you find this then it is problem with the event queue capacity, just try to increase spark.scheduler.listenerbus.eventqueue.capacity

On Thu, 7 May 2020 at 9:24 AM, angers.zhu <[hidden email]> wrote:
hi  zhangliyun,

Loss event in ListenerBus?

I meet this case when event queue’s capacity is too small.

签名由 网易邮箱大师 定制

On 05/7/2020 11:50[hidden email] wrote:



the stage shows it is running, no description , see attached picture.




在 2020-05-07 10:59:45,"ML Books" <[hidden email]> 写道:

You can click on the description of stage. It redirects to the tasks page. Please share that. 


On Thu, May 7, 2020, 7:44 AM zhangliyun <[hidden email]> wrote:
Hi all:
  i want to ask a question, it seems that my spark job hangs as 20+ hours.
 in the spark history log, it shows 8999 completed task while 2 not finished.  but when i go to the tasks page.  i did not find any running tasks.  All tasks are failed or Successful.   I guess it seems that all tasks are finished while driver is not notified.  i want to ask how to check or verify my guess. 



 



 


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


 



 

Reply | Threaded
Open this post in threaded view
|

Re: Re: a spark job hangs 20 hours and it shows 2 tasks not finished in stage page but all tasks shows finished or failed in task page

cloud0fan
Which Spark version do you use? If an event is so important that missing it would hang the application forever, we should not let it happen or fail earlier to give a better error message.

On Thu, May 7, 2020 at 3:37 PM zhangliyun <[hidden email]> wrote:
thanks all, I grep 

grep "Dropping event from queue" bslv3.stdout 

2020-05-05 16:00:39,084 ERROR AsyncEventQueue: Dropping event from queue executorManagement. This likely means one of the listeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler.


I saw the default value for spark.scheduler.listenerbus.eventqueue.capacity is  10000, so what I need to do is to double it?
like 20000. is there any bad effect if enlarge it to 20000?

Best Regards
Kelly Zhang

在 2020-05-07 12:04:06,"Sandeep Katta" <[hidden email]> 写道:

Yes we too faced this problem, just try to grep ‘ Dropping event from queue’ in the driver logs

If you find this then it is problem with the event queue capacity, just try to increase spark.scheduler.listenerbus.eventqueue.capacity

On Thu, 7 May 2020 at 9:24 AM, angers.zhu <[hidden email]> wrote:
hi  zhangliyun,

Loss event in ListenerBus?

I meet this case when event queue’s capacity is too small.

签名由 网易邮箱大师 定制

On 05/7/2020 11:50[hidden email] wrote:



the stage shows it is running, no description , see attached picture.




在 2020-05-07 10:59:45,"ML Books" <[hidden email]> 写道:

You can click on the description of stage. It redirects to the tasks page. Please share that. 


On Thu, May 7, 2020, 7:44 AM zhangliyun <[hidden email]> wrote:
Hi all:
  i want to ask a question, it seems that my spark job hangs as 20+ hours.
 in the spark history log, it shows 8999 completed task while 2 not finished.  but when i go to the tasks page.  i did not find any running tasks.  All tasks are failed or Successful.   I guess it seems that all tasks are finished while driver is not notified.  i want to ask how to check or verify my guess. 



 



 


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


 



 

Reply | Threaded
Open this post in threaded view
|

Re:Re: Re: a spark job hangs 20 hours and it shows 2 tasks not finished in stage page but all tasks shows finished or failed in task page

zhangliyun


spark 2.3.1





At 2020-05-08 16:53:01, "Wenchen Fan" <[hidden email]> wrote:

Which Spark version do you use? If an event is so important that missing it would hang the application forever, we should not let it happen or fail earlier to give a better error message.

On Thu, May 7, 2020 at 3:37 PM zhangliyun <[hidden email]> wrote:
thanks all, I grep 

grep "Dropping event from queue" bslv3.stdout 

2020-05-05 16:00:39,084 ERROR AsyncEventQueue: Dropping event from queue executorManagement. This likely means one of the listeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler.


I saw the default value for spark.scheduler.listenerbus.eventqueue.capacity is  10000, so what I need to do is to double it?
like 20000. is there any bad effect if enlarge it to 20000?

Best Regards
Kelly Zhang

在 2020-05-07 12:04:06,"Sandeep Katta" <[hidden email]> 写道:

Yes we too faced this problem, just try to grep ‘ Dropping event from queue’ in the driver logs

If you find this then it is problem with the event queue capacity, just try to increase spark.scheduler.listenerbus.eventqueue.capacity

On Thu, 7 May 2020 at 9:24 AM, angers.zhu <[hidden email]> wrote:
hi  zhangliyun,

Loss event in ListenerBus?

I meet this case when event queue’s capacity is too small.

签名由 网易邮箱大师 定制

On 05/7/2020 11:50[hidden email] wrote:



the stage shows it is running, no description , see attached picture.




在 2020-05-07 10:59:45,"ML Books" <[hidden email]> 写道:

You can click on the description of stage. It redirects to the tasks page. Please share that. 


On Thu, May 7, 2020, 7:44 AM zhangliyun <[hidden email]> wrote:
Hi all:
  i want to ask a question, it seems that my spark job hangs as 20+ hours.
 in the spark history log, it shows 8999 completed task while 2 not finished.  but when i go to the tasks page.  i did not find any running tasks.  All tasks are failed or Successful.   I guess it seems that all tasks are finished while driver is not notified.  i want to ask how to check or verify my guess. 



 



 


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


 



 

Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: a spark job hangs 20 hours and it shows 2 tasks not finished in stage page but all tasks shows finished or failed in task page

cloud0fan
Is it a UI issue or your job does hang forever? You can check the driver side log and see if the jobs were completed normally.

On Mon, May 11, 2020 at 5:35 AM zhangliyun <[hidden email]> wrote:


spark 2.3.1





At 2020-05-08 16:53:01, "Wenchen Fan" <[hidden email]> wrote:

Which Spark version do you use? If an event is so important that missing it would hang the application forever, we should not let it happen or fail earlier to give a better error message.

On Thu, May 7, 2020 at 3:37 PM zhangliyun <[hidden email]> wrote:
thanks all, I grep 

grep "Dropping event from queue" bslv3.stdout 

2020-05-05 16:00:39,084 ERROR AsyncEventQueue: Dropping event from queue executorManagement. This likely means one of the listeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler.


I saw the default value for spark.scheduler.listenerbus.eventqueue.capacity is  10000, so what I need to do is to double it?
like 20000. is there any bad effect if enlarge it to 20000?

Best Regards
Kelly Zhang

在 2020-05-07 12:04:06,"Sandeep Katta" <[hidden email]> 写道:

Yes we too faced this problem, just try to grep ‘ Dropping event from queue’ in the driver logs

If you find this then it is problem with the event queue capacity, just try to increase spark.scheduler.listenerbus.eventqueue.capacity

On Thu, 7 May 2020 at 9:24 AM, angers.zhu <[hidden email]> wrote:
hi  zhangliyun,

Loss event in ListenerBus?

I meet this case when event queue’s capacity is too small.

签名由 网易邮箱大师 定制

On 05/7/2020 11:50[hidden email] wrote:



the stage shows it is running, no description , see attached picture.




在 2020-05-07 10:59:45,"ML Books" <[hidden email]> 写道:

You can click on the description of stage. It redirects to the tasks page. Please share that. 


On Thu, May 7, 2020, 7:44 AM zhangliyun <[hidden email]> wrote:
Hi all:
  i want to ask a question, it seems that my spark job hangs as 20+ hours.
 in the spark history log, it shows 8999 completed task while 2 not finished.  but when i go to the tasks page.  i did not find any running tasks.  All tasks are failed or Successful.   I guess it seems that all tasks are finished while driver is not notified.  i want to ask how to check or verify my guess. 



 



 


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]