How to develop a listener that collects the statistics for spark sql and execution time for each operator

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

How to develop a listener that collects the statistics for spark sql and execution time for each operator

asma zgolli
Hello , 


I'm looking for a way to develop a listener that collects the statistics for spark sql queries as well as the execution time for each physical operator of the physical plan and store them in a database. 

I want to develop an application similar to the following application :
import org.apache.spark.scheduler._
import org.apache.log4j.LogManager
val logger = LogManager.getLogger("CustomListener")
class CustomListener extends SparkListener {
override def onStageCompleted(stageCompleted: SparkListenerStageCompleted): Unit = {
logger.warn(s"Stage completed, runTime: ${stageCompleted.stageInfo.taskMetrics.executorRunTime}, " +
s"cpuTime: ${stageCompleted.stageInfo.taskMetrics.executorCpuTime}")
}
}
val myListener=new CustomListener
//sc is the active Spark Context
sc.addSparkListener(myListener)
// run a simple Spark job and note the additional warning messages emitted by the CustomLister with
// Spark execution metrics, for exmaple run
spark.time(sql("select count(*) from range(1e4) cross join range(1e4)").show)


but for spark sql runtime statistics.
I want to store the same statistics as the ones displayed in : http://localhost:4040/SQL/execution/?id=1
the ones stored in the picture attached. 
 
thank you very much


yours sincerely,
Asma ZGOLLI

PhD student in data engineering - computer science
email alt:  [hidden email]


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

1.jpeg (476K) Download Attachment
2.jpeg (180K) Download Attachment