[DISCUSS] Spark cannot identify the problem executor

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Spark cannot identify the problem executor

陈晓宇
Hello all,

We've been using spark 2.3 with blacklist enabled and  often meet the problem that when executor A has some problem(like connection issue). Tasks on executor B, executor C will fail saying cannot read from executor A. Finally the job will fail due to task on executor B failed 4 times. 

I wonder whether there is any existing fix or discussions how to identify Executor A as the problem node.

Thanks