Yarn Log aggregation for a killed spark streaming job

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Yarn Log aggregation for a killed spark streaming job

Shankar Venkataraman
Hi!

We are seeing an issue around log aggregation under Yarn with Spark streaming. The specific case below is a example - we had to kill a spark streaming job, and would like to see the logs of the consumer so as to find out what happened before we had to kill it. 
Yarn reports the status of a killed Spark streaming job with a "log aggregation status" of N/A. Yarn seems to be doing the right thing for all other jobs with respect to log aggregation - jobs that  either aborted or were terminated normally after finish. 

Any clues on what may be happening. We are using Spark 1.5.2. Is there a fix for such behavior in later releases?

$ yarn application -status application_1460521878257_8455

16/06/14 19:24:54 INFO impl.TimelineClientImpl: Timeline service

address: http://abhdp-rm1.marketo.org:8188/ws/v1/timeline/

Application Report :

Application-Id : application_1460521878257_8455

Application-Name : ab-crmstreaming-service

Application-Type : SPARK

User : crmintegration

Queue : crm

Start-Time : 1463694307675

Finish-Time : 1464848682220

Progress : 0%

State : KILLED

Final-State : KILLED

Tracking-URL : N/A

RPC Port : -1

AM Host : N/A

Aggregate Resource Allocation : 0 MB-seconds, 0 vcore-seconds

Log Aggregation Status : N/A

Diagnostics : N/A

Reply | Threaded
Open this post in threaded view
|

Re: Yarn Log aggregation for a killed spark streaming job

nsalian
This post has NOT been accepted by the mailing list yet.
Hello,

Thank you for the question.
In the ResourceManager on YARN, please find the Application id for your application and the stderr logs from the logs link on that same row.

This should tell you what the latest activity was.
I am assuming the Containers were still in UNASSIGNED state before you killed the application.

For temporary data, HDFS /tmp directory should hold application specific information.

Hope this helps.
Neelesh S. Salian  
Cloudera