Issue with map Java lambda function with 3.0.0 preview and preview 2

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue with map Java lambda function with 3.0.0 preview and preview 2

Jean Georges Perrin
Hey guys,

This code:

    Dataset<Row> incrementalDf = spark
        .createDataset(l, Encoders.INT())
        .toDF();
    Dataset<Integer> dotsDs = incrementalDf
        .map(status -> {
          double x = Math.random() * 2 - 1;
          double y = Math.random() * 2 - 1;
          counter++;
          if (counter % 100000 == 0) {
            System.out.println("" + counter + " darts thrown so far");
          }
          return (x * x + y * y <= 1) ? 1 : 0;
        }, Encoders.INT());

used to work with Spark 2.x, in the two previous, it says:

The method map(Function1<Row,Integer>, Encoder<Integer>) is ambiguous for the type Dataset<Row>

IfI define my mapping function as a class it works fine. Here is the class:

  private final class DartMapper
      implements MapFunction<Row, Integer> {
    private static final long serialVersionUID = 38446L;

    @Override
    public Integer call(Row r) throws Exception {
      double x = Math.random() * 2 - 1;
      double y = Math.random() * 2 - 1;
      counter++;
      if (counter % 1000 == 0) {
        System.out.println("" + counter + " operations done so far");
      }
      return (x * x + y * y <= 1) ? 1 : 0;
    }
  }

Any hint on what/if I did wrong? 

jg



Reply | Threaded
Open this post in threaded view
|

Re: Issue with map Java lambda function with 3.0.0 preview and preview 2

Jean Georges Perrin
I forgot… it does the same thing with the reducer…

    int dartsInCircle = dotsDs.reduce((x, y) -> x + y);

jg

On Dec 28, 2019, at 12:38 PM, Jean-Georges Perrin <[hidden email]> wrote:

Hey guys,

This code:

    Dataset<Row> incrementalDf = spark
        .createDataset(l, Encoders.INT())
        .toDF();
    Dataset<Integer> dotsDs = incrementalDf
        .map(status -> {
          double x = Math.random() * 2 - 1;
          double y = Math.random() * 2 - 1;
          counter++;
          if (counter % 100000 == 0) {
            System.out.println("" + counter + " darts thrown so far");
          }
          return (x * x + y * y <= 1) ? 1 : 0;
        }, Encoders.INT());

used to work with Spark 2.x, in the two previous, it says:

The method map(Function1<Row,Integer>, Encoder<Integer>) is ambiguous for the type Dataset<Row>

IfI define my mapping function as a class it works fine. Here is the class:

  private final class DartMapper
      implements MapFunction<Row, Integer> {
    private static final long serialVersionUID = 38446L;

    @Override
    public Integer call(Row r) throws Exception {
      double x = Math.random() * 2 - 1;
      double y = Math.random() * 2 - 1;
      counter++;
      if (counter % 1000 == 0) {
        System.out.println("" + counter + " operations done so far");
      }
      return (x * x + y * y <= 1) ? 1 : 0;
    }
  }

Any hint on what/if I did wrong? 

jg




Reply | Threaded
Open this post in threaded view
|

Re: Issue with map Java lambda function with 3.0.0 preview and preview 2

Sean Owen-2
In reply to this post by Jean Georges Perrin
Yes, it's necessary to cast the lambda in Java as (MapFunction<X,Y>)
in many cases. This is because the Scala-specific and Java-specific
versions of .map() both end up accepting a function object that the
lambda can match, and an Encoder. What I'd have to go back and look up
is why that would be different in Spark 3; some of that has always
been the case with Java 8 in Spark 2. I think it might be related to
Scala 2.12; were you using Spark 2 with Scala 2.11 before?

On Sat, Dec 28, 2019 at 11:38 AM Jean-Georges Perrin <[hidden email]> wrote:

>
> Hey guys,
>
> This code:
>
>     Dataset<Row> incrementalDf = spark
>         .createDataset(l, Encoders.INT())
>         .toDF();
>     Dataset<Integer> dotsDs = incrementalDf
>         .map(status -> {
>           double x = Math.random() * 2 - 1;
>           double y = Math.random() * 2 - 1;
>           counter++;
>           if (counter % 100000 == 0) {
>             System.out.println("" + counter + " darts thrown so far");
>           }
>           return (x * x + y * y <= 1) ? 1 : 0;
>         }, Encoders.INT());
>
> used to work with Spark 2.x, in the two previous, it says:
>
> The method map(Function1<Row,Integer>, Encoder<Integer>) is ambiguous for the type Dataset<Row>
>
> IfI define my mapping function as a class it works fine. Here is the class:
>
>   private final class DartMapper
>       implements MapFunction<Row, Integer> {
>     private static final long serialVersionUID = 38446L;
>
>     @Override
>     public Integer call(Row r) throws Exception {
>       double x = Math.random() * 2 - 1;
>       double y = Math.random() * 2 - 1;
>       counter++;
>       if (counter % 1000 == 0) {
>         System.out.println("" + counter + " operations done so far");
>       }
>       return (x * x + y * y <= 1) ? 1 : 0;
>     }
>   }
>
> Any hint on what/if I did wrong?
>
> jg
>
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Issue with map Java lambda function with 3.0.0 preview and preview 2

Jean Georges Perrin
Thanks Sean - yup, I was having issues with Scala 2.12 for some stuff, so I kept 2.11...

Casting works. Makes the code a little ugly, but… It’s definitely a Scala 2.12 vs. 2.11, not a Spark 3 specifically.

jg

> On Dec 28, 2019, at 1:15 PM, Sean Owen <[hidden email]> wrote:
>
> Yes, it's necessary to cast the lambda in Java as (MapFunction<X,Y>)
> in many cases. This is because the Scala-specific and Java-specific
> versions of .map() both end up accepting a function object that the
> lambda can match, and an Encoder. What I'd have to go back and look up
> is why that would be different in Spark 3; some of that has always
> been the case with Java 8 in Spark 2. I think it might be related to
> Scala 2.12; were you using Spark 2 with Scala 2.11 before?
>
> On Sat, Dec 28, 2019 at 11:38 AM Jean-Georges Perrin <[hidden email]> wrote:
>>
>> Hey guys,
>>
>> This code:
>>
>>   Dataset<Row> incrementalDf = spark
>>       .createDataset(l, Encoders.INT())
>>       .toDF();
>>   Dataset<Integer> dotsDs = incrementalDf
>>       .map(status -> {
>>         double x = Math.random() * 2 - 1;
>>         double y = Math.random() * 2 - 1;
>>         counter++;
>>         if (counter % 100000 == 0) {
>>           System.out.println("" + counter + " darts thrown so far");
>>         }
>>         return (x * x + y * y <= 1) ? 1 : 0;
>>       }, Encoders.INT());
>>
>> used to work with Spark 2.x, in the two previous, it says:
>>
>> The method map(Function1<Row,Integer>, Encoder<Integer>) is ambiguous for the type Dataset<Row>
>>
>> IfI define my mapping function as a class it works fine. Here is the class:
>>
>> private final class DartMapper
>>     implements MapFunction<Row, Integer> {
>>   private static final long serialVersionUID = 38446L;
>>
>>   @Override
>>   public Integer call(Row r) throws Exception {
>>     double x = Math.random() * 2 - 1;
>>     double y = Math.random() * 2 - 1;
>>     counter++;
>>     if (counter % 1000 == 0) {
>>       System.out.println("" + counter + " operations done so far");
>>     }
>>     return (x * x + y * y <= 1) ? 1 : 0;
>>   }
>> }
>>
>> Any hint on what/if I did wrong?
>>
>> jg
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]