sortWithinPartitions in Structured Streaming

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

sortWithinPartitions in Structured Streaming

Currently, all sorting is disallowed with structured streaming queries. Not
allowing global sorting makes sense, as you can't sort an infinite list, but
could non-global sorting (i.e. sortWithinPartitions) be allowed? I'm running
into this with an external source I'm using, but not sure if this would be
useful to file sources as well. I have to foreachBatch so that I can do a

Two main questions:
- Does a local sort cause issues with any exactly-once guarantees streaming
queries provides? I can't say I know or understand how these semantics work.
Or are there other issues I can't think of this would cause?

- Is the change as simple as changing the unsupported operations check to
only look for global sorts instead of all sorts?

The only other discussion on this topic I found is  here
, which suggested the local sort might be something to consider allowing in
structured streaming.

Sent from:

To unsubscribe e-mail: [hidden email]