[DISCUSS] Spark cannot identify the problem executor
We've been using spark 2.3 with blacklist enabled and often meet the problem that when executor A has some problem(like connection issue). Tasks on executor B, executor C will fail saying cannot read from executor A. Finally the job will fail due to task on executor B failed 4 times.
I wonder whether there is any existing fix or discussions how to identify Executor A as the problem node.