Can someone please suggest on below question related to
spark streaming query / context cleaner / garbage collection issue we are
facing. We suspect it’s bug causing memory leak.
We have a spark 2.3 cluster running a streaming query. We are
observing behavior that no matter how much memory we allocate to executor, JVM
heap eventually grows to the limit and the JVM's GC starts to cause frequent
timeouts. Eventually the executor is marked "lost" or
"dead". GC logging is enabled and it takes about 30-45 min to fill
the heap. After that full GCs become much more frequent. We tried to increase
more memory, gc interval and other relevant parameters of memory but have been
observing same issue.
We enabled context cleaner debug logs and observe only
broadcast/Accumulator related cleaning messages. We don't see RDDs being
received for cleanup with message “Cleaning RDD..”
(Ref: this code ContextCleaner.scala#L213)
. I have attached context cleaner logs for reference as well.