Building Kafka 0.10 Source for Structured Streaming Error.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Building Kafka 0.10 Source for Structured Streaming Error.

satyajit vegesna
Hi All,

I am trying too build Kafka-0-10-sql module under external folder in apache spark source code.
Once i generate jar file using, 
build/mvn package -DskipTests -pl external/kafka-0-10-sql
i get jar file created under external/kafka-0-10-sql/target.

And try to run spark-shell with jars created in target folder as below,
bin/spark-shell --jars $SPARK_HOME/external/kafka-0-10-sql/target/*.jar 

i get below error based on the command,

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

17/06/28 11:54:03 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Spark context Web UI available at http://10.1.10.241:4040

Spark context available as 'sc' (master = local[*], app id = local-1498676043936).

Spark session available as 'spark'.

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT

      /_/

         

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131)

Type in expressions to have them evaluated.

Type :help for more information.

scala> val lines = spark.readStream.format("kafka").option("kafka.bootstrap.servers", "localhost:9092").option("subscribe", "test").load()

java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArrayDeserializer

  at org.apache.spark.sql.kafka010.KafkaSourceProvider$.<init>(KafkaSourceProvider.scala:378)

  at org.apache.spark.sql.kafka010.KafkaSourceProvider$.<clinit>(KafkaSourceProvider.scala)

  at org.apache.spark.sql.kafka010.KafkaSourceProvider.validateStreamOptions(KafkaSourceProvider.scala:325)

  at org.apache.spark.sql.kafka010.KafkaSourceProvider.sourceSchema(KafkaSourceProvider.scala:60)

  at org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:192)

  at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:87)

  at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:87)

  at org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30)

  at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:150)

  ... 48 elided

Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.serialization.ByteArrayDeserializer

  at java.net.URLClassLoader.findClass(URLClassLoader.java:381)

  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

  ... 57 more

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

i have tried building the jar with dependencies, but still face the same error.

But when i try to do --package with spark-shell using bin/spark-shell --package org.apache.spark:spark-sql-kafka-0-10_2.11:2.1.0 , it works fine.

The reason, i am trying to build something from source code, is because i want to try pushing dataframe data into kafka topic, based on the url https://github.com/apache/spark/commit/b0a5cd89097c563e9949d8cfcf84d18b03b8d24c, which doesn't work with version 2.1.0.


Any help would be highly appreciated.


Regards,

Satyajit.


Loading...