Disallowing null mergeCombiners

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Disallowing null mergeCombiners

Patrick Wendell
Hey All,

There is a small API change that we are considering for the external
sort patch. Previously we allowed mergeCombiner to be null when map
side aggregation was not enabled. This is because it wasn't necessary
in that case since mappers didn't ship pre-aggregated values to
reducers.

Because the external sort capability also relies on the mergeCombiner
function to merge partially-aggregated on-disk segments, we now need
it all the time, even if map side aggregation is enabled. This is a
fairly esoteric thing that I'm not sure anyone other than Shark ever
used, but I want to check in case anyone had feelings about this.

The relevant code is here:

https://github.com/apache/incubator-spark/pull/303/files#diff-f70e97c099b5eac05c75288cb215e080R72

- Patrick
Reply | Threaded
Open this post in threaded view
|

Re: Disallowing null mergeCombiners

rxin
I added the option that doesn't require the caller to specify the
mergeCombiner closure a while ago when I wanted to disable mapSideCombine.
In virtually all use cases I know of, it is fine & easy to specify a
mergeCombiner, so I'm all for this given it simplifies the codebase.


On Tue, Dec 31, 2013 at 5:05 PM, Patrick Wendell <[hidden email]> wrote:

> Hey All,
>
> There is a small API change that we are considering for the external
> sort patch. Previously we allowed mergeCombiner to be null when map
> side aggregation was not enabled. This is because it wasn't necessary
> in that case since mappers didn't ship pre-aggregated values to
> reducers.
>
> Because the external sort capability also relies on the mergeCombiner
> function to merge partially-aggregated on-disk segments, we now need
> it all the time, even if map side aggregation is enabled. This is a
> fairly esoteric thing that I'm not sure anyone other than Shark ever
> used, but I want to check in case anyone had feelings about this.
>
> The relevant code is here:
>
>
> https://github.com/apache/incubator-spark/pull/303/files#diff-f70e97c099b5eac05c75288cb215e080R72
>
> - Patrick
>