Quantcast

Scala examples for Spark do not work as written in documentation

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Scala examples for Spark do not work as written in documentation

GlennStrycker
On the webpage http://spark.apache.org/examples.html, there is an example written as

val count = spark.parallelize(1 to NUM_SAMPLES).map(i =>
  val x = Math.random()
  val y = Math.random()
  if (x*x + y*y < 1) 1 else 0
).reduce(_ + _)
println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)

This does not execute in Spark, which gives me an error:
<console>:2: error: illegal start of simple expression
         val x = Math.random()
         ^

If I rewrite the query slightly, adding in {}, it works:

val count = spark.parallelize(1 to 10000).map(i =>
   {
   val x = Math.random()
   val y = Math.random()
   if (x*x + y*y < 1) 1 else 0
   }
).reduce(_ + _)
println("Pi is roughly " + 4.0 * count / 10000.0)

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Scala examples for Spark do not work as written in documentation

rxin
Thanks for pointing it out. We should update the website to fix the code.

val count = spark.parallelize(1 to NUM_SAMPLES).map { i =>
  val x = Math.random()
  val y = Math.random()
  if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)



On Fri, May 16, 2014 at 9:41 AM, GlennStrycker <[hidden email]>wrote:

> On the webpage http://spark.apache.org/examples.html, there is an example
> written as
>
> val count = spark.parallelize(1 to NUM_SAMPLES).map(i =>
>   val x = Math.random()
>   val y = Math.random()
>   if (x*x + y*y < 1) 1 else 0
> ).reduce(_ + _)
> println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)
>
> This does not execute in Spark, which gives me an error:
> <console>:2: error: illegal start of simple expression
>          val x = Math.random()
>          ^
>
> If I rewrite the query slightly, adding in {}, it works:
>
> val count = spark.parallelize(1 to 10000).map(i =>
>    {
>    val x = Math.random()
>    val y = Math.random()
>    if (x*x + y*y < 1) 1 else 0
>    }
> ).reduce(_ + _)
> println("Pi is roughly " + 4.0 * count / 10000.0)
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Scala examples for Spark do not work as written in documentation

Mark Hamstra
In reply to this post by GlennStrycker
Actually, the better way to write the multi-line closure would be:

val count = spark.parallelize(1 to NUM_SAMPLES).map { _ =>
  val x = Math.random()
  val y = Math.random()
  if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)


On Fri, May 16, 2014 at 9:41 AM, GlennStrycker <[hidden email]>wrote:

> On the webpage http://spark.apache.org/examples.html, there is an example
> written as
>
> val count = spark.parallelize(1 to NUM_SAMPLES).map(i =>
>   val x = Math.random()
>   val y = Math.random()
>   if (x*x + y*y < 1) 1 else 0
> ).reduce(_ + _)
> println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)
>
> This does not execute in Spark, which gives me an error:
> <console>:2: error: illegal start of simple expression
>          val x = Math.random()
>          ^
>
> If I rewrite the query slightly, adding in {}, it works:
>
> val count = spark.parallelize(1 to 10000).map(i =>
>    {
>    val x = Math.random()
>    val y = Math.random()
>    if (x*x + y*y < 1) 1 else 0
>    }
> ).reduce(_ + _)
> println("Pi is roughly " + 4.0 * count / 10000.0)
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Scala examples for Spark do not work as written in documentation

Mark Hamstra
Sorry, looks like an extra line got inserted in there.  One more try:

val count = spark.parallelize(1 to NUM_SAMPLES).map { _ =>
  val x = Math.random()
  val y = Math.random()
  if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)



On Fri, May 16, 2014 at 12:36 PM, Mark Hamstra <[hidden email]>wrote:

> Actually, the better way to write the multi-line closure would be:
>
> val count = spark.parallelize(1 to NUM_SAMPLES).map { _ =>
>
>   val x = Math.random()
>   val y = Math.random()
>   if (x*x + y*y < 1) 1 else 0
> }.reduce(_ + _)
>
>
> On Fri, May 16, 2014 at 9:41 AM, GlennStrycker <[hidden email]>wrote:
>
>> On the webpage http://spark.apache.org/examples.html, there is an example
>> written as
>>
>> val count = spark.parallelize(1 to NUM_SAMPLES).map(i =>
>>   val x = Math.random()
>>   val y = Math.random()
>>   if (x*x + y*y < 1) 1 else 0
>> ).reduce(_ + _)
>> println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)
>>
>> This does not execute in Spark, which gives me an error:
>> <console>:2: error: illegal start of simple expression
>>          val x = Math.random()
>>          ^
>>
>> If I rewrite the query slightly, adding in {}, it works:
>>
>> val count = spark.parallelize(1 to 10000).map(i =>
>>    {
>>    val x = Math.random()
>>    val y = Math.random()
>>    if (x*x + y*y < 1) 1 else 0
>>    }
>> ).reduce(_ + _)
>> println("Pi is roughly " + 4.0 * count / 10000.0)
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>>
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Scala examples for Spark do not work as written in documentation

GlennStrycker
In reply to this post by Mark Hamstra
Why does the reduce function only work on sums of keys of the same type and does not support other functional forms?

I am having trouble in another example where instead of 1s and 0s, the output of the map function is something like A=(1,2) and B=(3,4).  I need a reduce function that can return something complicated based on reduce( (A,B) => (arbitrary fcn1 of A and B, arbitrary fcn2 of A and B) ), but I am only getting reduce( (A,B) => (arbitrary fcn1 of A, arbitrary fcn2 of A) ).

See http://apache-spark-developers-list.1001551.n3.nabble.com/reduce-only-removes-duplicates-cannot-be-arbitrary-function-td6606.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Scala examples for Spark do not work as written in documentation

andy
Administrator
In reply to this post by Mark Hamstra
I fixed the bug, but I kept the parameter "i" instead of "_" since that (1)
keeps it more parallel to the python and java versions which also use
functions with a named variable and (2) doesn't require readers to know
this particular use of the "_" syntax in Scala.

Thanks for catching this Glenn.

Andy


On Fri, May 16, 2014 at 12:38 PM, Mark Hamstra <[hidden email]>wrote:

> Sorry, looks like an extra line got inserted in there.  One more try:
>
> val count = spark.parallelize(1 to NUM_SAMPLES).map { _ =>
>   val x = Math.random()
>   val y = Math.random()
>   if (x*x + y*y < 1) 1 else 0
> }.reduce(_ + _)
>
>
>
> On Fri, May 16, 2014 at 12:36 PM, Mark Hamstra <[hidden email]
> >wrote:
>
> > Actually, the better way to write the multi-line closure would be:
> >
> > val count = spark.parallelize(1 to NUM_SAMPLES).map { _ =>
> >
> >   val x = Math.random()
> >   val y = Math.random()
> >   if (x*x + y*y < 1) 1 else 0
> > }.reduce(_ + _)
> >
> >
> > On Fri, May 16, 2014 at 9:41 AM, GlennStrycker <[hidden email]
> >wrote:
> >
> >> On the webpage http://spark.apache.org/examples.html, there is an
> example
> >> written as
> >>
> >> val count = spark.parallelize(1 to NUM_SAMPLES).map(i =>
> >>   val x = Math.random()
> >>   val y = Math.random()
> >>   if (x*x + y*y < 1) 1 else 0
> >> ).reduce(_ + _)
> >> println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)
> >>
> >> This does not execute in Spark, which gives me an error:
> >> <console>:2: error: illegal start of simple expression
> >>          val x = Math.random()
> >>          ^
> >>
> >> If I rewrite the query slightly, adding in {}, it works:
> >>
> >> val count = spark.parallelize(1 to 10000).map(i =>
> >>    {
> >>    val x = Math.random()
> >>    val y = Math.random()
> >>    if (x*x + y*y < 1) 1 else 0
> >>    }
> >> ).reduce(_ + _)
> >> println("Pi is roughly " + 4.0 * count / 10000.0)
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593.html
> >> Sent from the Apache Spark Developers List mailing list archive at
> >> Nabble.com.
> >>
> >
> >
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Scala examples for Spark do not work as written in documentation

Will Benton
Hey, sorry to reanimate this thread, but just a quick question:  why do the examples (on http://spark.apache.org/examples.html) use "spark" for the SparkContext reference?  This is minor, but it seems like it could be a little confusing for people who want to run them in the shell and need to change "spark" to "sc".  (I noticed because this was a speedbump for a colleague who is trying out Spark.)


thanks,
wb

----- Original Message -----

> From: "Andy Konwinski" <[hidden email]>
> To: [hidden email]
> Sent: Tuesday, May 20, 2014 4:06:33 PM
> Subject: Re: Scala examples for Spark do not work as written in documentation
>
> I fixed the bug, but I kept the parameter "i" instead of "_" since that (1)
> keeps it more parallel to the python and java versions which also use
> functions with a named variable and (2) doesn't require readers to know
> this particular use of the "_" syntax in Scala.
>
> Thanks for catching this Glenn.
>
> Andy
>
>
> On Fri, May 16, 2014 at 12:38 PM, Mark Hamstra
> <[hidden email]>wrote:
>
> > Sorry, looks like an extra line got inserted in there.  One more try:
> >
> > val count = spark.parallelize(1 to NUM_SAMPLES).map { _ =>
> >   val x = Math.random()
> >   val y = Math.random()
> >   if (x*x + y*y < 1) 1 else 0
> > }.reduce(_ + _)
> >
> >
> >
> > On Fri, May 16, 2014 at 12:36 PM, Mark Hamstra <[hidden email]
> > >wrote:
> >
> > > Actually, the better way to write the multi-line closure would be:
> > >
> > > val count = spark.parallelize(1 to NUM_SAMPLES).map { _ =>
> > >
> > >   val x = Math.random()
> > >   val y = Math.random()
> > >   if (x*x + y*y < 1) 1 else 0
> > > }.reduce(_ + _)
> > >
> > >
> > > On Fri, May 16, 2014 at 9:41 AM, GlennStrycker <[hidden email]
> > >wrote:
> > >
> > >> On the webpage http://spark.apache.org/examples.html, there is an
> > example
> > >> written as
> > >>
> > >> val count = spark.parallelize(1 to NUM_SAMPLES).map(i =>
> > >>   val x = Math.random()
> > >>   val y = Math.random()
> > >>   if (x*x + y*y < 1) 1 else 0
> > >> ).reduce(_ + _)
> > >> println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)
> > >>
> > >> This does not execute in Spark, which gives me an error:
> > >> <console>:2: error: illegal start of simple expression
> > >>          val x = Math.random()
> > >>          ^
> > >>
> > >> If I rewrite the query slightly, adding in {}, it works:
> > >>
> > >> val count = spark.parallelize(1 to 10000).map(i =>
> > >>    {
> > >>    val x = Math.random()
> > >>    val y = Math.random()
> > >>    if (x*x + y*y < 1) 1 else 0
> > >>    }
> > >> ).reduce(_ + _)
> > >> println("Pi is roughly " + 4.0 * count / 10000.0)
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> View this message in context:
> > >>
> > http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593.html
> > >> Sent from the Apache Spark Developers List mailing list archive at
> > >> Nabble.com.
> > >>
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Scala examples for Spark do not work as written in documentation

Patrick Wendell
Those are pretty old - but I think the reason Matei did that was to
make it less confusing for brand new users. `spark` is actually a
valid identifier because it's just a variable name (val spark = new
SparkContext()) but I agree this could be confusing for users who want
to drop into the shell.

On Fri, Jun 20, 2014 at 12:04 PM, Will Benton <[hidden email]> wrote:

> Hey, sorry to reanimate this thread, but just a quick question:  why do the examples (on http://spark.apache.org/examples.html) use "spark" for the SparkContext reference?  This is minor, but it seems like it could be a little confusing for people who want to run them in the shell and need to change "spark" to "sc".  (I noticed because this was a speedbump for a colleague who is trying out Spark.)
>
>
> thanks,
> wb
>
> ----- Original Message -----
>> From: "Andy Konwinski" <[hidden email]>
>> To: [hidden email]
>> Sent: Tuesday, May 20, 2014 4:06:33 PM
>> Subject: Re: Scala examples for Spark do not work as written in documentation
>>
>> I fixed the bug, but I kept the parameter "i" instead of "_" since that (1)
>> keeps it more parallel to the python and java versions which also use
>> functions with a named variable and (2) doesn't require readers to know
>> this particular use of the "_" syntax in Scala.
>>
>> Thanks for catching this Glenn.
>>
>> Andy
>>
>>
>> On Fri, May 16, 2014 at 12:38 PM, Mark Hamstra
>> <[hidden email]>wrote:
>>
>> > Sorry, looks like an extra line got inserted in there.  One more try:
>> >
>> > val count = spark.parallelize(1 to NUM_SAMPLES).map { _ =>
>> >   val x = Math.random()
>> >   val y = Math.random()
>> >   if (x*x + y*y < 1) 1 else 0
>> > }.reduce(_ + _)
>> >
>> >
>> >
>> > On Fri, May 16, 2014 at 12:36 PM, Mark Hamstra <[hidden email]
>> > >wrote:
>> >
>> > > Actually, the better way to write the multi-line closure would be:
>> > >
>> > > val count = spark.parallelize(1 to NUM_SAMPLES).map { _ =>
>> > >
>> > >   val x = Math.random()
>> > >   val y = Math.random()
>> > >   if (x*x + y*y < 1) 1 else 0
>> > > }.reduce(_ + _)
>> > >
>> > >
>> > > On Fri, May 16, 2014 at 9:41 AM, GlennStrycker <[hidden email]
>> > >wrote:
>> > >
>> > >> On the webpage http://spark.apache.org/examples.html, there is an
>> > example
>> > >> written as
>> > >>
>> > >> val count = spark.parallelize(1 to NUM_SAMPLES).map(i =>
>> > >>   val x = Math.random()
>> > >>   val y = Math.random()
>> > >>   if (x*x + y*y < 1) 1 else 0
>> > >> ).reduce(_ + _)
>> > >> println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)
>> > >>
>> > >> This does not execute in Spark, which gives me an error:
>> > >> <console>:2: error: illegal start of simple expression
>> > >>          val x = Math.random()
>> > >>          ^
>> > >>
>> > >> If I rewrite the query slightly, adding in {}, it works:
>> > >>
>> > >> val count = spark.parallelize(1 to 10000).map(i =>
>> > >>    {
>> > >>    val x = Math.random()
>> > >>    val y = Math.random()
>> > >>    if (x*x + y*y < 1) 1 else 0
>> > >>    }
>> > >> ).reduce(_ + _)
>> > >> println("Pi is roughly " + 4.0 * count / 10000.0)
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> View this message in context:
>> > >>
>> > http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593.html
>> > >> Sent from the Apache Spark Developers List mailing list archive at
>> > >> Nabble.com.
>> > >>
>> > >
>> > >
>> >
>>
Loading...