Possible bug - Java iterator/iterable inconsistency

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Possible bug - Java iterator/iterable inconsistency

Asher Krim
In Spark 2 + Java + RDD api, the use of iterables was replaced with iterators. I just encountered an inconsistency in `flatMapValues` that may be a bug:


The problem is that `FlatMapFunction` was changed to return an iterator, but `rdd.flatMapValues` still expects an iterable. Am I using these constructs correctly? Is there a workaround other than converting the iterator to an iterable outside of the function?

Thanks,
--
Asher Krim
Senior Software Engineer
Reply | Threaded
Open this post in threaded view
|

Re: Possible bug - Java iterator/iterable inconsistency

Sean Owen
Hm. Unless I am also totally missing or forgetting something, I think you're right. The equivalent in PairRDDFunctions.scala operations on a function from T to TraversableOnce[U] and a TraversableOnce is most like java.util.Iterator.

You can work around it by wrapping it in a faked IteratorIterable.

I think this is fixable in the API by deprecating this method and adding a new one that takes a FlatMapFunction. We'd have to triple-check in a test that this doesn't cause an API compatibility problem with respect to Java 8 lambdas, but if that's settled, I think this could be fixed without breaking the API.

On Wed, Jan 18, 2017 at 8:50 PM Asher Krim <[hidden email]> wrote:
In Spark 2 + Java + RDD api, the use of iterables was replaced with iterators. I just encountered an inconsistency in `flatMapValues` that may be a bug:


The problem is that `FlatMapFunction` was changed to return an iterator, but `rdd.flatMapValues` still expects an iterable. Am I using these constructs correctly? Is there a workaround other than converting the iterator to an iterable outside of the function?

Thanks,
--
Asher Krim
Senior Software Engineer
Reply | Threaded
Open this post in threaded view
|

Re: Possible bug - Java iterator/iterable inconsistency

Sean Owen
Yes, confirmed that fixing it unfortunately causes trouble in Java 8. See https://issues.apache.org/jira/browse/SPARK-19287 for further discussion.

On Wed, Jan 18, 2017 at 9:00 PM Sean Owen <[hidden email]> wrote:
Hm. Unless I am also totally missing or forgetting something, I think you're right. The equivalent in PairRDDFunctions.scala operations on a function from T to TraversableOnce[U] and a TraversableOnce is most like java.util.Iterator.

You can work around it by wrapping it in a faked IteratorIterable.

I think this is fixable in the API by deprecating this method and adding a new one that takes a FlatMapFunction. We'd have to triple-check in a test that this doesn't cause an API compatibility problem with respect to Java 8 lambdas, but if that's settled, I think this could be fixed without breaking the API.

On Wed, Jan 18, 2017 at 8:50 PM Asher Krim <[hidden email]> wrote:
In Spark 2 + Java + RDD api, the use of iterables was replaced with iterators. I just encountered an inconsistency in `flatMapValues` that may be a bug:


The problem is that `FlatMapFunction` was changed to return an iterator, but `rdd.flatMapValues` still expects an iterable. Am I using these constructs correctly? Is there a workaround other than converting the iterator to an iterable outside of the function?

Thanks,
--
Asher Krim
Senior Software Engineer
Reply | Threaded
Open this post in threaded view
|

Re: Possible bug - Java iterator/iterable inconsistency

Asher Krim
Thanks Sean!

On Thu, Jan 19, 2017 at 6:09 AM, Sean Owen <[hidden email]> wrote:
Yes, confirmed that fixing it unfortunately causes trouble in Java 8. See https://issues.apache.org/jira/browse/SPARK-19287 for further discussion.

On Wed, Jan 18, 2017 at 9:00 PM Sean Owen <[hidden email]> wrote:
Hm. Unless I am also totally missing or forgetting something, I think you're right. The equivalent in PairRDDFunctions.scala operations on a function from T to TraversableOnce[U] and a TraversableOnce is most like java.util.Iterator.

You can work around it by wrapping it in a faked IteratorIterable.

I think this is fixable in the API by deprecating this method and adding a new one that takes a FlatMapFunction. We'd have to triple-check in a test that this doesn't cause an API compatibility problem with respect to Java 8 lambdas, but if that's settled, I think this could be fixed without breaking the API.

On Wed, Jan 18, 2017 at 8:50 PM Asher Krim <[hidden email]> wrote:
In Spark 2 + Java + RDD api, the use of iterables was replaced with iterators. I just encountered an inconsistency in `flatMapValues` that may be a bug:


The problem is that `FlatMapFunction` was changed to return an iterator, but `rdd.flatMapValues` still expects an iterable. Am I using these constructs correctly? Is there a workaround other than converting the iterator to an iterable outside of the function?

Thanks,
--
Asher Krim
Senior Software Engineer



--
Asher Krim
Senior Software Engineer