I'd like to know what's the root reason why multiple aggregations on streaming dataframe is not allowed since it's a very useful feature, and flink has supported it for a long time.
There is PR for this but not yet merged.
Heres the proposal for supporting it in "append" mode - https://github.com/apache/spark/pull/23576. You could see if it addresses your requirement and post your feedback in the PR.
For "update" mode its going to be much harder to support this without first adding support for "retractions", otherwise we would end up with wrong results.
Thanks, I'll check it out.
I'm also very interested in this feature but the PR is open since
January 2019 and was not updated. It raised a design discussion
around watermarks and a design doc was written (https://docs.google.com/document/d/1IAH9UQJPUiUCLd7H6dazRK2k1szDX38SnM6GVNZYvUo/edit#heading=h.npkueh4bbkz1).
We also commented this design but no matter what it seems that the
subject is still stale.
Is there any interest in the community in delivering this feature
or is it considered worthless ? If the latter, can you explain why
On 22/05/2019 03:38, 张万新 wrote:
Unfortunately I don't see enough active committers working on Structured Streaming; I don't expect major features/improvements can be brought in this situation.
Technically I can review and merge the PR on major improvements in SS, but that depends on how huge the proposal is changing. If the proposal brings conceptual change, being reviewed by a committer wouldn't still be enough.
So that's not due to the fact we think it's worthless. (That might be only me though.) I'd understand as there's not much investment on SS. There's also a known workaround for multiple aggregations (I've documented in the SS guide doc, in "Limitation of global watermark" section), though I totally agree the workaround is bad.
On Tue, Sep 1, 2020 at 12:28 AM Etienne Chauchot <[hidden email]> wrote:
Hi Jungtaek Lim,
Nice to hear from you again since last time we talked :) and
congrats on becoming a Spark committer in the meantime ! (if I'm
not mistaking you were not at the time)
I totally agree with what you're saying on merging structural
parts of Spark without having a broader consensus. What I don't
understand is why there is not more investment in SS. Especially
because in another thread the community is discussing about
deprecating the regular DStream streaming framework.
Is the orientation of Spark now mostly batch ?
PS: yeah I saw your update on the doc when I took a look at 3.0
preview 2 searching for this particular feature. And regarding the
workaround, I'm not sure it meets my needs as it will add delays
and also may mess up with watermarks.
On 04/09/2020 08:06, Jungtaek Lim wrote:
|Free forum by Nabble||Edit this page|