[Structured Streaming] OOM on ConsoleSink with large inputs

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Structured Streaming] OOM on ConsoleSink with large inputs

Gerard Maas
Devs,  

While investigating another issue, I came across this OOM error when using the Console Sink with any source that can be larger than the available driver memory. In my case, I was using the File source and I had a 14G file in the monitored dir.

I traced back the issue to a `df.collect` in the Console Sink code.

I hope a committer can check it out.

-kr, Gerard.
Loading...