Re: [Structured streaming, V2] commit on ContinuousReader

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: [Structured streaming, V2] commit on ContinuousReader

Joseph Torres
In the master branch, we currently call this method in ContinuousExecution.commit().

Note that the ContinuousReader API is experimental and undergoing active design work. We will definitely include some kind of functionality to back-commit data once it's been processed, but the handle we eventually stabilize won't necessarily be `commit(end: Offset)`.

On Thu, May 3, 2018 at 10:43 AM, Jiří Syrový <[hidden email]> wrote:
Version: 2.3, DataSourceV2, ContinuousReader

Hi,

We're creating a new data source to fetch data from streaming source that requires commiting received data and we would like to commit data once in a while after it has been retrieved and correctly processed and then fetch more. 

One option could be to rely on spark committing already read data using commit(end: Offset) that is present in ContinuousReader (v2.reader.streaming), but it seems that this method is never called.

The question is if this method commit(end: Offset) is ever used and when? I went through part of Spark code base, but haven't really found any place where it could be called.

Thanks,
Jiri