structured streaming documentation does not match behavior

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

structured streaming documentation does not match behavior

assaf.mendelson

Hi,

I have started to play around with structured streaming and it seems the documentation (structured streaming programming guide) does not match the actual behavior I am seeing.

It says in the documentation that maxFilesPerTrigger (as well as latestFirst) are options for the File sink. However, in fact, at least maxFilesPerTrigger does not seem to have any real effect. On the other hand, the streaming source (readStream) which has no documentation for this option, does limit the number of files.

This behavior actually makes more sense than the documentation as I expect the file reader to define how to read files rather than the sink (e.g. if I would use a kafka sink or foreach sink, they should still get the same behavior from the reading).

 

Thanks,

              Assaf.

 

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: structured streaming documentation does not match behavior

Shixiong(Ryan) Zhu
Good catch. These are file source options. Could you submit a PR to fix the doc? Thanks!

On Thu, Jun 15, 2017 at 10:46 AM, Mendelson, Assaf <[hidden email]> wrote:

Hi,

I have started to play around with structured streaming and it seems the documentation (structured streaming programming guide) does not match the actual behavior I am seeing.

It says in the documentation that maxFilesPerTrigger (as well as latestFirst) are options for the File sink. However, in fact, at least maxFilesPerTrigger does not seem to have any real effect. On the other hand, the streaming source (readStream) which has no documentation for this option, does limit the number of files.

This behavior actually makes more sense than the documentation as I expect the file reader to define how to read files rather than the sink (e.g. if I would use a kafka sink or foreach sink, they should still get the same behavior from the reading).

 

Thanks,

              Assaf.

 


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: structured streaming documentation does not match behavior

Shixiong(Ryan) Zhu

On Thu, Jun 15, 2017 at 10:55 AM, Shixiong(Ryan) Zhu <[hidden email]> wrote:
Good catch. These are file source options. Could you submit a PR to fix the doc? Thanks!

On Thu, Jun 15, 2017 at 10:46 AM, Mendelson, Assaf <[hidden email]> wrote:

Hi,

I have started to play around with structured streaming and it seems the documentation (structured streaming programming guide) does not match the actual behavior I am seeing.

It says in the documentation that maxFilesPerTrigger (as well as latestFirst) are options for the File sink. However, in fact, at least maxFilesPerTrigger does not seem to have any real effect. On the other hand, the streaming source (readStream) which has no documentation for this option, does limit the number of files.

This behavior actually makes more sense than the documentation as I expect the file reader to define how to read files rather than the sink (e.g. if I would use a kafka sink or foreach sink, they should still get the same behavior from the reading).

 

Thanks,

              Assaf.

 



Loading...