SQL Visualization for cached Dataset

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

SQL Visualization for cached Dataset

Tomasz Gawęda

Recently I had to optimize few Apache Spark SQL queries. Some of the
Datasets were reused, so they were cached. However after caching I don't
see SQL Visualization for the cached Dataset in Spark UI - I see only
InMemoryRelation node. Explain result at the bottom of the page still
has full plan.

Is this an expected behaviour? In such cases we have much less options
to debug performance in Spark. My suggestion is to show full diagram on
the first action after cache or to show separate SQL query for cache -
second option however probably is not possible as cache does not trigger
calculation, so we can't get metrics.

Workaround is to temporairly disable caching, but it consumes much time
to do it, especially on large datasets

Pozdrawiam / Best regards,


To unsubscribe e-mail: [hidden email]