Spark-Locality: Hinting Spark location of the executor does not take effect.

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark-Locality: Hinting Spark location of the executor does not take effect.

Nasrulla Khan Haris

HI Spark developers,

 

If I want to hint spark to use particular list of hosts to execute tasks on. I see that getBlockLocations is used to get the list of hosts from HDFS.

 

https://github.com/apache/spark/blob/7955b3962ac46b89564e0613db7bea98a1478bf2/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L386

 

 

Hinting Spark by custom getBlockLocation which return Array of BlockLocations with host ip address doesn’t help, Spark continues to host it on other executors hosts.

 

Is there something I am doing wrong ?

 

Test:

Spark.read.csv()

 

 

Thanks,

Nasrulla

 

Reply | Threaded
Open this post in threaded view
|

Spark-Locality: Hinting Spark location of the executor does not take effect.

Nasrulla Khan Haris

HI Spark developers,

 

If I want to hint spark to use particular list of hosts to execute tasks on. I see that getBlockLocations is used to get the list of hosts from HDFS.

 

https://github.com/apache/spark/blob/7955b3962ac46b89564e0613db7bea98a1478bf2/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L386

 

 

Hinting Spark by custom getBlockLocation which return Array of BlockLocations with host ip address doesn’t help, Spark continues to host it on other executors hosts.

 

Is there something I am doing wrong ?

 

Test:

Spark.read.csv()

 

 

Appreciate your inputs 😊

 

Thanks,

Nasrulla

 

Reply | Threaded
Open this post in threaded view
|

Re: Spark-Locality: Hinting Spark location of the executor does not take effect.

nakhanha-2
In reply to this post by Nasrulla Khan Haris
Even though i provide hint of wn4's ip address

Spark schedules on wn1
2020-09-17 01:21:50,038 DEBUG [dag-scheduler-event-loop]
scheduler.TaskSetManager: Valid locality levels for TaskSet 11.0:
NODE_LOCAL, RACK_LOCAL, ANY
2020-09-17 01:21:50,039 DEBUG [dispatcher-event-loop-0]
cluster.YarnScheduler: parentName: , name: TaskSet_11.0, runningTasks: 0
2020-09-17 01:21:50,040 INFO  [dispatcher-event-loop-0]
scheduler.TaskSetManager: Starting task 0.0 in stage 11.0 (TID 13,
wn1-vegasr.r0erhw3gxezevknl0vbc42vctb.dx.internal.cloudapp.net, executor 6,
partition 5, NODE_LOCAL, 4994 bytes)
2020-09-17 01:21:50,040 DEBUG [dispatcher-event-loop-0]
scheduler.TaskSetManager: No tasks for locality level NODE_LOCAL, so moving
to locality level RACK_LOCAL
2020-09-17 01:21:50,041 DEBUG [dispatcher-event-loop-0]
scheduler.TaskSetManager: No tasks for locality level RACK_LOCAL, so moving
to locality level ANY
2020-09-17 01:21:50,042 DEBUG [dispatcher-event-loop-0]
cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 13 on
executor id: 6 hostname:
wn1-vegasr.r0erhw3gxezevknl0vbc42vctb.dx.internal.cloudapp.net.





--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark-Locality: Hinting Spark location of the executor does not take effect.

nakhanha-2
In reply to this post by Nasrulla Khan Haris
Even though i provide hint of wn4's ip address

Spark schedules on wn1
2020-09-17 01:21:50,038 DEBUG [dag-scheduler-event-loop]
scheduler.TaskSetManager: Valid locality levels for TaskSet 11.0:
NODE_LOCAL, RACK_LOCAL, ANY
2020-09-17 01:21:50,039 DEBUG [dispatcher-event-loop-0]
cluster.YarnScheduler: parentName: , name: TaskSet_11.0, runningTasks: 0
2020-09-17 01:21:50,040 INFO  [dispatcher-event-loop-0]
scheduler.TaskSetManager: Starting task 0.0 in stage 11.0 (TID 13,
wn1-vegasr.r0erhw3gxezevknl0vbc42vctb.dx.internal.cloudapp.net, executor 6,
partition 5, NODE_LOCAL, 4994 bytes)
2020-09-17 01:21:50,040 DEBUG [dispatcher-event-loop-0]
scheduler.TaskSetManager: No tasks for locality level NODE_LOCAL, so moving
to locality level RACK_LOCAL
2020-09-17 01:21:50,041 DEBUG [dispatcher-event-loop-0]
scheduler.TaskSetManager: No tasks for locality level RACK_LOCAL, so moving
to locality level ANY
2020-09-17 01:21:50,042 DEBUG [dispatcher-event-loop-0]
cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 13 on
executor id: 6 hostname:
wn1-vegasr.r0erhw3gxezevknl0vbc42vctb.dx.internal.cloudapp.net.





--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark-Locality: Hinting Spark location of the executor does not take effect.

nakhanha-2
In reply to this post by Nasrulla Khan Haris
Even though i provide hint of wn4's ip address Spark schedules on wn1 2020-09-17 01:21:50,038 DEBUG [dag-scheduler-event-loop] scheduler.TaskSetManager: Valid locality levels for TaskSet 11.0: NODE_LOCAL, RACK_LOCAL, ANY 2020-09-17 01:21:50,039 DEBUG [dispatcher-event-loop-0] cluster.YarnScheduler: parentName: , name: TaskSet_11.0, runningTasks: 0 2020-09-17 01:21:50,040 INFO [dispatcher-event-loop-0] scheduler.TaskSetManager: Starting task 0.0 in stage 11.0 (TID 13, wn1-vegasr.r0erhw3gxezevknl0vbc42vctb.dx.internal.cloudapp.net, executor 6, partition 5, NODE_LOCAL, 4994 bytes) 2020-09-17 01:21:50,040 DEBUG [dispatcher-event-loop-0] scheduler.TaskSetManager: No tasks for locality level NODE_LOCAL, so moving to locality level RACK_LOCAL 2020-09-17 01:21:50,041 DEBUG [dispatcher-event-loop-0] scheduler.TaskSetManager: No tasks for locality level RACK_LOCAL, so moving to locality level ANY 2020-09-17 01:21:50,042 DEBUG [dispatcher-event-loop-0] cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 13 on executor id: 6 hostname: wn1-vegasr.r0erhw3gxezevknl0vbc42vctb.dx.internal.cloudapp.net.

Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

RE: Spark-Locality: Hinting Spark location of the executor does not take effect.

Nasrulla Khan Haris
In reply to this post by Nasrulla Khan Haris

Was providing IP address instead of FQDN. Providing FQDN helped.

 

Thanks,

 

From: Nasrulla Khan Haris
Sent: Wednesday, September 16, 2020 4:11 PM
To: [hidden email]
Subject: Spark-Locality: Hinting Spark location of the executor does not take effect.

 

HI Spark developers,

 

If I want to hint spark to use particular list of hosts to execute tasks on. I see that getBlockLocations is used to get the list of hosts from HDFS.

 

https://github.com/apache/spark/blob/7955b3962ac46b89564e0613db7bea98a1478bf2/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L386

 

 

Hinting Spark by custom getBlockLocation which return Array of BlockLocations with host ip address doesn’t help, Spark continues to host it on other executors hosts.

 

Is there something I am doing wrong ?

 

Test:

Spark.read.csv()

 

 

Appreciate your inputs 😊

 

Thanks,

Nasrulla