Bridging gap between Spark UI and Code

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Bridging gap between Spark UI and Code

Michal Sankot
Hi,
when I analyze and debug our Spark batch jobs executions it's a pain to
find out how blocks in Spark UI Jobs/SQL tab correspond to the actual
Scala code that we write and how much time they take. Would there be a
way to somehow instruct compiler or something and get this information
into Spark UI?

At the moment linking Spark UI elements with our code is a guess work
driven by adding and removing lines of code and reruning the job, which
is tedious. A possibility to make our life easier e.g. by running Spark
jobs in dedicated debug mode where this information would be available
would be greatly appreciated. (Though I don't know whether it's possible
at all).

Thanks,
Michal

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Bridging gap between Spark UI and Code

RussS
Have you looked in the DAG visualization? Each block refer to the code line invoking it. 

For Dataframes the execution plan will let you know explicitly which operations are in which stages.

On Tue, Jul 21, 2020, 8:18 AM Michal Sankot <[hidden email]> wrote:
Hi,
when I analyze and debug our Spark batch jobs executions it's a pain to
find out how blocks in Spark UI Jobs/SQL tab correspond to the actual
Scala code that we write and how much time they take. Would there be a
way to somehow instruct compiler or something and get this information
into Spark UI?

At the moment linking Spark UI elements with our code is a guess work
driven by adding and removing lines of code and reruning the job, which
is tedious. A possibility to make our life easier e.g. by running Spark
jobs in dedicated debug mode where this information would be available
would be greatly appreciated. (Though I don't know whether it's possible
at all).

Thanks,
Michal

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Bridging gap between Spark UI and Code

Michal Sankot
Yes, the problem is that DAGs only refer to code line (action) that inovked it. It doesn't provide information about how individual transformations link to the code.

So you can have dozen of stages, each with the same code line which invoked it, doing different stuff. And then we guess what it's actually doing.


On 7/21/20 15:36, Russell Spitzer wrote:
Have you looked in the DAG visualization? Each block refer to the code line invoking it. 

For Dataframes the execution plan will let you know explicitly which operations are in which stages.

On Tue, Jul 21, 2020, 8:18 AM Michal Sankot [hidden email] wrote:
Hi,
when I analyze and debug our Spark batch jobs executions it's a pain to
find out how blocks in Spark UI Jobs/SQL tab correspond to the actual
Scala code that we write and how much time they take. Would there be a
way to somehow instruct compiler or something and get this information
into Spark UI?

At the moment linking Spark UI elements with our code is a guess work
driven by adding and removing lines of code and reruning the job, which
is tedious. A possibility to make our life easier e.g. by running Spark
jobs in dedicated debug mode where this information would be available
would be greatly appreciated. (Though I don't know whether it's possible
at all).

Thanks,
Michal

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Bridging gap between Spark UI and Code

Michal Sankot
And to be clear. Yes, execution plans show what exactly it's doing. The problem is that it's unclear how it's related to the actual Scala/Python code.

On 7/21/20 15:45, Michal Sankot wrote:
Yes, the problem is that DAGs only refer to code line (action) that inovked it. It doesn't provide information about how individual transformations link to the code.

So you can have dozen of stages, each with the same code line which invoked it, doing different stuff. And then we guess what it's actually doing.


On 7/21/20 15:36, Russell Spitzer wrote:
Have you looked in the DAG visualization? Each block refer to the code line invoking it. 

For Dataframes the execution plan will let you know explicitly which operations are in which stages.

On Tue, Jul 21, 2020, 8:18 AM Michal Sankot [hidden email] wrote:
Hi,
when I analyze and debug our Spark batch jobs executions it's a pain to
find out how blocks in Spark UI Jobs/SQL tab correspond to the actual
Scala code that we write and how much time they take. Would there be a
way to somehow instruct compiler or something and get this information
into Spark UI?

At the moment linking Spark UI elements with our code is a guess work
driven by adding and removing lines of code and reruning the job, which
is tedious. A possibility to make our life easier e.g. by running Spark
jobs in dedicated debug mode where this information would be available
would be greatly appreciated. (Though I don't know whether it's possible
at all).

Thanks,
Michal

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


--
MichalSankot
BigData Engineer