Spark behavior with changing data source

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Spark behavior with changing data source

Vipul Rajan

I have a use case where I am joining a streamingDataFrame with a static DataFrame. The static DataFrame is read from a parquet table (a directory containing parquet files). This parquet data is updated by another process once a day. I am using structured streaming for the streaming DataFrame.

My question is what would happen to my static DataFrame?

  • Would it update itself because of the lazy execution or is there some weird caching behavior that can prevent this?

  • Can the updation process make my code crash?

  • Would it be possible to force the DataFrame to update itself once a day in any way?

I am working with Spark 2.3.2