Spark Tasks Progress

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark Tasks Progress

Sultan Alamro
Hi all,

I am trying to do some actions at the Driver side in Spark while an application is running. The Driver needs to know the tasks progress before making any decision. I know that tasks progress can be accessed within each executor or task from RecordReader class by calling getProgress().

The question is, how can I let the Driver call or have an access to getProgress() method of each task? I thought about using broadcast and accumulator variables, but I don't know how the Driver would distinguish between different tasks.

Note that I am not looking for results displayed in Spark UI.

Any help is appreciated!

Reply | Threaded
Open this post in threaded view
|

RE: Spark Tasks Progress

Ilya Matiach-2

[hidden email] great question.  I had a similar scenario, where workers needed to aggregate host:port information for initializing an MPI ring, and I used direct socket communication between the workers and driver.

 

This is where the driver accepts sockets from workers:

https://github.com/Azure/mmlspark/blob/master/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMUtils.scala#L125

 

And this is where the workers send information to the driver:

https://github.com/Azure/mmlspark/blob/master/src/main/scala/com/microsoft/ml/spark/lightgbm/TrainUtils.scala#L347

 

You could probably do something similar, and send the partitionId or some other id to the driver.

 

Hope this helps!

Thank you, Ilya

 

 

From: Sultan Alamro <[hidden email]>
Sent: Friday, September 20, 2019 8:26 PM
To: [hidden email]
Subject: Spark Tasks Progress

 

Hi all,

 

I am trying to do some actions at the Driver side in Spark while an application is running. The Driver needs to know the tasks progress before making any decision. I know that tasks progress can be accessed within each executor or task from RecordReader class by calling getProgress().

The question is, how can I let the Driver call or have an access to getProgress() method of each task? I thought about using broadcast and accumulator variables, but I don't know how the Driver would distinguish between different tasks.

Note that I am not looking for results displayed in Spark UI.

Any help is appreciated!