Quantcast

is there a way to persist the lineages generated by spark?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

is there a way to persist the lineages generated by spark?

kant kodali
Hi All,

I am wondering if there a way to persist the lineages generated by spark underneath? Some of our clients want us to prove if the result of the computation that we are showing on a dashboard is correct and for that If we can show the lineage of transformations that are executed to get to the result then that can be the Q.E.D moment but I am not even sure if this is even possible with spark?

Thanks,
kant
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: is there a way to persist the lineages generated by spark?

Jörn Franke
I do think this is the right way, you will have to do testing with test data verifying that the expected output of the calculation is the output.
Even if the logical Plan Is correct your calculation might not be. E.g. There can be bugs in Spark, in the UI or (what is very often) the client describes a calculation, but in the end the description is wrong.

> On 4. Apr 2017, at 05:19, kant kodali <[hidden email]> wrote:
>
> Hi All,
>
> I am wondering if there a way to persist the lineages generated by spark underneath? Some of our clients want us to prove if the result of the computation that we are showing on a dashboard is correct and for that If we can show the lineage of transformations that are executed to get to the result then that can be the Q.E.D moment but I am not even sure if this is even possible with spark?
>
> Thanks,
> kant

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: is there a way to persist the lineages generated by spark?

kant kodali
yes Lineage that is actually replayable is what is needed for Validation process. So we can address questions like how a system arrived at a state S at a time T. I guess a good analogy is event sourcing.


On Thu, Apr 6, 2017 at 10:30 PM, Jörn Franke <[hidden email]> wrote:
I do think this is the right way, you will have to do testing with test data verifying that the expected output of the calculation is the output.
Even if the logical Plan Is correct your calculation might not be. E.g. There can be bugs in Spark, in the UI or (what is very often) the client describes a calculation, but in the end the description is wrong.

> On 4. Apr 2017, at 05:19, kant kodali <[hidden email]> wrote:
>
> Hi All,
>
> I am wondering if there a way to persist the lineages generated by spark underneath? Some of our clients want us to prove if the result of the computation that we are showing on a dashboard is correct and for that If we can show the lineage of transformations that are executed to get to the result then that can be the Q.E.D moment but I am not even sure if this is even possible with spark?
>
> Thanks,
> kant

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: is there a way to persist the lineages generated by spark?

Tom Lynch
This is not quite what you are asking, but I often save intermediate results down to parquet files so I can diagnose problems and rebuild data from a known good state without having to re-run every processing step.

On Fri, Apr 7, 2017 at 1:08 AM, kant kodali <[hidden email]> wrote:
yes Lineage that is actually replayable is what is needed for Validation process. So we can address questions like how a system arrived at a state S at a time T. I guess a good analogy is event sourcing.


On Thu, Apr 6, 2017 at 10:30 PM, Jörn Franke <[hidden email]> wrote:
I do think this is the right way, you will have to do testing with test data verifying that the expected output of the calculation is the output.
Even if the logical Plan Is correct your calculation might not be. E.g. There can be bugs in Spark, in the UI or (what is very often) the client describes a calculation, but in the end the description is wrong.

> On 4. Apr 2017, at 05:19, kant kodali <[hidden email]> wrote:
>
> Hi All,
>
> I am wondering if there a way to persist the lineages generated by spark underneath? Some of our clients want us to prove if the result of the computation that we are showing on a dashboard is correct and for that If we can show the lineage of transformations that are executed to get to the result then that can be the Q.E.D moment but I am not even sure if this is even possible with spark?
>
> Thanks,
> kant


Loading...