re: sharing data via kafka broker using spark streaming/ AnalysisException on collect()
I have a quick question regarding how to share data (a small data collection) between a kafka producer and consumer using spark streaming (spark 2.2):
(A) the data published by a kafka producer is received in order on the kafka consumer side (see (a) copied below).
(B) however, collect() or cache() on a streaming dataframe does not seem to be supported (see links in (b) below): i got this: Exception in thread "DataProducer" org.apache.spark.sql.AnalysisException: Queries with streaming sources must be executed with writeStream.start();;
My question would be:
--- How can I use the collection data (on a streaming dataframe) arrived on the consumer side, e.g convert it to an array of objects?
--- Maybe there's another quick way to use kafka for sharing static data (instead of streaming) between two spark application services (without any common spark context and session etc.)?
I have copied some code snippet in (c).
It seems to be a very simple use case scenario to share a global collection between a spark producer and consumer.
But I spent entire day to try various options and go thru online resources such as google-general/apache-spark/stackoverflow/cloudera/etc/etc.
Any help would be very much appreciated!
(a) streaming data (df) received on the consumer side (console sink):