Can Spark avoid Container killed by Yarn?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Can Spark avoid Container killed by Yarn?

Yang Zhang
I'm always suffering Spark SQL job fails with error "Container exited with a non-zero exit code 143".
I know that it was casused by the memory used execeeds the limits of spark.yarn.executor.memoryOverhead. As shown below, memory allocation request was failed at 18/11/08 17:36:05, then it RECEIVED SIGNAL TERM. Can spark executor avoid the fate of being destroyed ?


my conf:
--master yarn-client \
--driver-memory 10G \
--executor-memory 10G \
--executor-cores 5 \
--num-executors 12 \
--conf "spark.executor.extraJavaOptions= -XX:MaxPermSize=256M" \
--conf "spark.sql.shuffle.partitions=200" \
--conf "spark.scheduler.mode=FAIR" \
--conf "spark.yarn.executor.memoryOverhead=2048" \



=============================================================
18/11/08 17:35:52 INFO [Executor task launch worker for task 13694] FileScanRDD: Reading File path: hdfs://phive.smyprd.com:8020/user/hive/warehouse/events/dt=20180103/part-00000, range: 134217728-268435456, partition values: [20180103]
18/11/08 17:35:52 INFO [Executor task launch worker for task 13700] FileScanRDD: Reading File path: hdfs://phive.smyprd.com:8020/user/hive/warehouse/events/dt=20180104/part-00000, range: 402653184-536870912, partition values: [20180104]
18/11/08 17:35:52 INFO [Executor task launch worker for task 13688] FileScanRDD: Reading File path: hdfs://phive.smyprd.com:8020/user/hive/warehouse/events/dt=20180101/part-00000, range: 134217728-268435456, partition values: [20180101]
18/11/08 17:35:52 INFO [Executor task launch worker for task 13694] TorrentBroadcast: Started reading broadcast variable 135
18/11/08 17:35:52 INFO [Executor task launch worker for task 13694] MemoryStore: Block broadcast_135_piece0 stored as bytes in memory (estimated size 27.2 KB, free 1822.3 MB)
18/11/08 17:35:52 INFO [Executor task launch worker for task 13694] TorrentBroadcast: Reading broadcast variable 135 took 3 ms
18/11/08 17:35:52 INFO [Executor task launch worker for task 13694] MemoryStore: Block broadcast_135 stored as values in memory (estimated size 365.6 KB, free 1821.9 MB)
18/11/08 17:36:00 INFO [Executor task launch worker for task 13700] ShuffleExternalSorter: Thread 1100 spilling sort data of 580.0 MB to disk (0  time so far)
18/11/08 17:36:03 INFO [Executor task launch worker for task 13688] ShuffleExternalSorter: Thread 1098 spilling sort data of 580.0 MB to disk (0  time so far)
18/11/08 17:36:05 WARN [Executor task launch worker for task 13694] TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
18/11/08 17:36:05 INFO [Executor task launch worker for task 13694] ShuffleExternalSorter: Thread 1099 spilling sort data of 514.0 MB to disk (0  time so far)
18/11/08 17:36:05 ERROR [SIGTERM handler] CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
18/11/08 17:36:05 INFO [Thread-2] DiskBlockManager: Shutdown hook called
18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Shutdown hook called
18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data5/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-e6345a15-d684-440a-a4f7-d23884ee9806
18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data9/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-23870d5b-9e6f-4587-bf01-eaf4ea986293
18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data7/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-65f184dc-af68-422b-9d2b-e09941ff2679
18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data15/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-17488560-736e-4ba4-9ae3-f07e1e33afda
18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data4/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-745de0ee-aa39-4cea-b05e-6f924006d4a9
18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data6/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-2db274c9-0c45-4e15-ad42-7bce16329b31
18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data10/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-6f41703c-e844-4130-9800-1cde62e8bf8c
18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data3/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-6eb2ce0e-a4d6-4300-8154-965847e671ef
18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data12/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-cd6c6d05-052e-4316-b7ff-342c5cfac817
18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data2/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-c702d40f-1997-4742-80ea-30a15c6ec738
18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data11/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-74777ef3-13c4-43d6-bd84-47cc0aba195e
18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data13/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-690ac7f2-9ffe-437a-a4d7-7426b85993ca
18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data14/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-9eee2e05-7d13-45fb-abe3-5583942af555
18/11/08 17:36:10 INFO [Thread-2] ShutdownHookManager: Deleting directory /data8/yarn/nm/usercache/admin/appcache/application_1527906298937_39699/spark-5c2d57a1-2201-4fe0-bbbf-8aeaa124cbbf
=============================================================