[DISCUSS] remove the incomplete code path on aggregation for continuous mode

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] remove the incomplete code path on aggregation for continuous mode

Jungtaek Lim-2
Hi devs,

during experiment on complete mode I realized we left some incomplete code parts on supporting aggregation for continuous mode. (shuffle & coalesce)

The work had been occurred around first half of 2018 and stopped, and no work has been done for around 2 years (so I don't expect anyone is working on this). The functionality is undocumented (as the work was only done partially) and continuous mode is experimental so I don't feel risks to get rid of the part.

What do you think? If it makes sense then I'll raise a PR to get rid of the incomplete codes.

Thanks,
Jungtaek Lim (HeartSaVioR)

ps. I'm kind of feeling the same with continuous mode itself - it has been still experimental over the 2 years, and while it lacks lots of essential things (backpressure, epoch synchronization, query progress, etc.) there's no additional major work over 1 year. We eventually have been excluded continuous mode for the new streaming feature like observable metric, streaming UI, etc. because it is on top of the feature which continuous mode lacks.

Unlike incomplete code path, I'm not strongly against this, as it has been documented and someone might use the feature in production. I still think it's ideal to retire the feature smoothly.
(Please chime in with the use case if someone has the production cases.)

ps2. It feels like "feature branch" looks to be a thing to consider for efforts on one big feature.
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] remove the incomplete code path on aggregation for continuous mode

Jungtaek Lim-2
Let me share the effect on removing the incomplete and undocumented code path. I manually tried out removing the code path and here's the change.


1,120 lines deleted without hurting any existing streaming tests, except a suite I removed as well since it tests such code path. No need to update documentation as it was never publicized. This also removes some rules which only apply for continuous mode, which gave exceptions on the fact that "continuous mode is only available for map-like operations". Removing them would be back to simplify the usage of continuous mode.

Also worth noting that I had to manually remove the code path instead of revert, because the code path has been changed to reflect DSv2 change. What this means? We have to "update" the code path and concern about compatibility, etc. while it never be used in production.

On Tue, May 19, 2020 at 1:14 PM Jungtaek Lim <[hidden email]> wrote:
Hi devs,

during experiment on complete mode I realized we left some incomplete code parts on supporting aggregation for continuous mode. (shuffle & coalesce)

The work had been occurred around first half of 2018 and stopped, and no work has been done for around 2 years (so I don't expect anyone is working on this). The functionality is undocumented (as the work was only done partially) and continuous mode is experimental so I don't feel risks to get rid of the part.

What do you think? If it makes sense then I'll raise a PR to get rid of the incomplete codes.

Thanks,
Jungtaek Lim (HeartSaVioR)

ps. I'm kind of feeling the same with continuous mode itself - it has been still experimental over the 2 years, and while it lacks lots of essential things (backpressure, epoch synchronization, query progress, etc.) there's no additional major work over 1 year. We eventually have been excluded continuous mode for the new streaming feature like observable metric, streaming UI, etc. because it is on top of the feature which continuous mode lacks.

Unlike incomplete code path, I'm not strongly against this, as it has been documented and someone might use the feature in production. I still think it's ideal to retire the feature smoothly.
(Please chime in with the use case if someone has the production cases.)

ps2. It feels like "feature branch" looks to be a thing to consider for efforts on one big feature.
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] remove the incomplete code path on aggregation for continuous mode

Jungtaek Lim-2
Bump this again. I filed SPARK-31985 [1] and plan to submit a PR in a couple of days if there's no voice on the reason we should keep it.


On Thu, May 21, 2020 at 8:54 AM Jungtaek Lim <[hidden email]> wrote:
Let me share the effect on removing the incomplete and undocumented code path. I manually tried out removing the code path and here's the change.


1,120 lines deleted without hurting any existing streaming tests, except a suite I removed as well since it tests such code path. No need to update documentation as it was never publicized. This also removes some rules which only apply for continuous mode, which gave exceptions on the fact that "continuous mode is only available for map-like operations". Removing them would be back to simplify the usage of continuous mode.

Also worth noting that I had to manually remove the code path instead of revert, because the code path has been changed to reflect DSv2 change. What this means? We have to "update" the code path and concern about compatibility, etc. while it never be used in production.

On Tue, May 19, 2020 at 1:14 PM Jungtaek Lim <[hidden email]> wrote:
Hi devs,

during experiment on complete mode I realized we left some incomplete code parts on supporting aggregation for continuous mode. (shuffle & coalesce)

The work had been occurred around first half of 2018 and stopped, and no work has been done for around 2 years (so I don't expect anyone is working on this). The functionality is undocumented (as the work was only done partially) and continuous mode is experimental so I don't feel risks to get rid of the part.

What do you think? If it makes sense then I'll raise a PR to get rid of the incomplete codes.

Thanks,
Jungtaek Lim (HeartSaVioR)

ps. I'm kind of feeling the same with continuous mode itself - it has been still experimental over the 2 years, and while it lacks lots of essential things (backpressure, epoch synchronization, query progress, etc.) there's no additional major work over 1 year. We eventually have been excluded continuous mode for the new streaming feature like observable metric, streaming UI, etc. because it is on top of the feature which continuous mode lacks.

Unlike incomplete code path, I'm not strongly against this, as it has been documented and someone might use the feature in production. I still think it's ideal to retire the feature smoothly.
(Please chime in with the use case if someone has the production cases.)

ps2. It feels like "feature branch" looks to be a thing to consider for efforts on one big feature.