Spark data source resiliency

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark data source resiliency

assaf.mendelson
Hi All,

I am implemented a data source V2 which integrates with an internal system
and I need to make it resilient to errors in the internal data source.

The issue is that currently, if there is an exception in the data reader,
the exception seems to fail the entire task. I would prefer instead to just
restart the relevant partition.

Is there a way to do it or would I need to solve it inside the iterator
itself?

Thanks,
    Assaf.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark data source resiliency

cloud0fan
a failure in the data reader results to a task failure, and Spark will re-try the task for you (IIRC re-try 3 times before fail the job).

Can you check your Spark log and see if the task fails consistently?

On Tue, Jul 3, 2018 at 2:17 PM assaf.mendelson <[hidden email]> wrote:
Hi All,

I am implemented a data source V2 which integrates with an internal system
and I need to make it resilient to errors in the internal data source.

The issue is that currently, if there is an exception in the data reader,
the exception seems to fail the entire task. I would prefer instead to just
restart the relevant partition.

Is there a way to do it or would I need to solve it inside the iterator
itself?

Thanks,
    Assaf.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark data source resiliency

assaf.mendelson
That is what I expected, however, I did a very simple test (using println
just to see when the exception is triggered in the iterator) using local
master and I saw it failed once and cause the entire operation to fail.

Is this something which may be unique to local master (or some default
configuration which should be tested)?  I can't see a specific configuration
to handle this in the documentation.

Thanks,
    Assaf.




--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark data source resiliency

cloud0fan
I believe you are using something like `local[8]` as your Spark mater, which can't retry tasks. Please try `local[8, 3]` which can re-try failed tasks 3 times.

On Tue, Jul 3, 2018 at 2:42 PM assaf.mendelson <[hidden email]> wrote:
That is what I expected, however, I did a very simple test (using println
just to see when the exception is triggered in the iterator) using local
master and I saw it failed once and cause the entire operation to fail.

Is this something which may be unique to local master (or some default
configuration which should be tested)?  I can't see a specific configuration
to handle this in the documentation.

Thanks,
    Assaf.




--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark data source resiliency

assaf.mendelson
You are correct, this solved it.
Thanks



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]