UnknownSource NullPointerException in CodeGen. with Custom Strategy

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

UnknownSource NullPointerException in CodeGen. with Custom Strategy

Nasrulla Khan Haris

HI Spark Developers,

 

Encountering this NullPointerException while reading parquet file in multi-node cluster. However while running the spark-job locally on single-node (development environment) not encountering this error. Appreciate your inputs.

 

Thanks in advance,

NKH

 

pqjah.dx.internal.cloudapp.net, executor 1): java.lang.NullPointerException

        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1$agg_FastHashMap_0.hash$(Unknown Source)

        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1$agg_FastHashMap_0.findOrInsert(Unknown Source)

        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doConsume_0$(Unknown Source)

        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithKeys_0$(Unknown Source)

        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)

        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)

        at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)

        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)

        at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)

        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)

        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)

        at org.apache.spark.scheduler.Task.run(Task.scala:122)

        at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)

        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)

        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)

 

Driver stacktrace:

  at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)

  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)

  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)

  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)

  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)

  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)

  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)

  at scala.Option.foreach(Option.scala:257)

  at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)

  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)

  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)

  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)

  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)

  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)

  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2065)

  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2086)

  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2105)

  at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:365)

  at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)

  at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3389)

  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2550)

  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2550)

  at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370)

  at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)

  at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)

  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)

  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369)

  at org.apache.spark.sql.Dataset.head(Dataset.scala:2550)

  at org.apache.spark.sql.Dataset.take(Dataset.scala:2764)

  at org.apache.spark.sql.Dataset.getRows(Dataset.scala:254)

  at org.apache.spark.sql.Dataset.showString(Dataset.scala:291)

  at org.apache.spark.sql.Dataset.show(Dataset.scala:751)

  at org.apache.spark.sql.Dataset.show(Dataset.scala:710)

  at org.apache.spark.sql.Dataset.show(Dataset.scala:719)

 

Reply | Threaded
Open this post in threaded view
|

RE: UnknownSource NullPointerException in CodeGen. with Custom Strategy

Nasrulla Khan Haris

StackTrace with WSCG disabled

 

 

scala> df29.groupBy("LastName").count().show()

20/06/28 06:20:55 WARN TaskSetManager: Lost task 1.0 in stage 2.0 (TID 8, wn5-nkhwes.zhqzi2stszlevpekfsrlmpqjah.dx.internal.cloudapp.net, executor 4): java.lang.NullPointerException

        at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:109)

        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)

        at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:193)

        at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.<init>(TungstenAggregationIterator.scala:360)

        at org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$doExecute$1$$anonfun$4.apply(HashAggregateExec.scala:112)

        at org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$doExecute$1$$anonfun$4.apply(HashAggregateExec.scala:102)

        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)

        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)

        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)

        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)

        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)

        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)

        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)

        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)

        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)

        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)

        at org.apache.spark.scheduler.Task.run(Task.scala:122)

        at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)

        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)

        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)

 

Thanks,

Nasrulla

 

From: Nasrulla Khan Haris
Sent: Saturday, June 27, 2020 11:18 PM
To: [hidden email]
Subject: UnknownSource NullPointerException in CodeGen. with Custom Strategy

 

HI Spark Developers,

 

Encountering this NullPointerException while reading parquet file in multi-node cluster. However while running the spark-job locally on single-node (development environment) not encountering this error. Appreciate your inputs.

 

Thanks in advance,

NKH

 

pqjah.dx.internal.cloudapp.net, executor 1): java.lang.NullPointerException

        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1$agg_FastHashMap_0.hash$(Unknown Source)

        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1$agg_FastHashMap_0.findOrInsert(Unknown Source)

        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doConsume_0$(Unknown Source)

        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithKeys_0$(Unknown Source)

        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)

        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)

        at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)

        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)

        at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)

        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)

        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)

        at org.apache.spark.scheduler.Task.run(Task.scala:122)

        at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)

        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)

        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)

 

Driver stacktrace:

  at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)

  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)

  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)

  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)

  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)

  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)

  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)

  at scala.Option.foreach(Option.scala:257)

  at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)

  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)

  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)

  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)

  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)

  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)

  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2065)

  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2086)

  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2105)

  at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:365)

  at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)

  at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3389)

  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2550)

  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2550)

  at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370)

  at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)

  at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)

  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)

  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369)

  at org.apache.spark.sql.Dataset.head(Dataset.scala:2550)

  at org.apache.spark.sql.Dataset.take(Dataset.scala:2764)

  at org.apache.spark.sql.Dataset.getRows(Dataset.scala:254)

  at org.apache.spark.sql.Dataset.showString(Dataset.scala:291)

  at org.apache.spark.sql.Dataset.show(Dataset.scala:751)

  at org.apache.spark.sql.Dataset.show(Dataset.scala:710)

  at org.apache.spark.sql.Dataset.show(Dataset.scala:719)

 

Reply | Threaded
Open this post in threaded view
|

Re: UnknownSource NullPointerException in CodeGen. with Custom Strategy

wuyi
In reply to this post by Nasrulla Khan Haris
Hi Nasrulla,

Could you give a complete demo to reproduce the issue?



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]