RDD.top() stacktrace

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

RDD.top() stacktrace

Andrew Ash
Spark 0.9.0


Hi Spark devs,

I'm pretty sure this stacktrace is a bug in the way Spark is using the type
system but I don't quite know what it is.  Something to do with type bounds
judging from my Googling.

Can someone with more Scala-foo than me please take a look?  In the
meantime I'll be avoiding top() for a bit.

Thanks!
Andrew



This stracktrace came about when I called
val myRDD: RDD[(String,Int)] = ...
myRDD.reduceByKey(_+_).top(100)

But my toy example doesn't trigger the repro:
sc.parallelize(Seq( ("A",10), ("B",5), ("A",4), ("C", 15)
)).reduceByKey(_+_).top(2)



14/02/14 03:15:07 ERROR OneForOneStrategy:
scala.collection.immutable.$colon$colon cannot be cast to
org.apache.spark.util.BoundedPriorityQueue
java.lang.ClassCastException: scala.collection.immutable.$colon$colon
cannot be cast to org.apache.spark.util.BoundedPriorityQueue
        at org.apache.spark.rdd.RDD$$anonfun$top$2.apply(RDD.scala:873)
        at org.apache.spark.rdd.RDD$$anonfun$6.apply(RDD.scala:671)
        at org.apache.spark.rdd.RDD$$anonfun$6.apply(RDD.scala:668)
        at
org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56)
        at
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:859)
        at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:616)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Reply | Threaded
Open this post in threaded view
|

Re: RDD.top() stacktrace

Xuefeng Wu
Hi Andrew,

Sorry, I can not reproduce the issue by:

scala> import org.apache.spark.rdd.RDD

scala> val myRDD: RDD[(String,Int)] = sc.parallelize(Seq( ("A",10),
("B",5), ("A",4), ("C", 15)))

scala> myRDD.reduceByKey(_+_).top(2)

Any different compare with your example ?



On Fri, Feb 14, 2014 at 7:38 PM, Andrew Ash <[hidden email]> wrote:

> Spark 0.9.0
>
>
> Hi Spark devs,
>
> I'm pretty sure this stacktrace is a bug in the way Spark is using the type
> system but I don't quite know what it is.  Something to do with type bounds
> judging from my Googling.
>
> Can someone with more Scala-foo than me please take a look?  In the
> meantime I'll be avoiding top() for a bit.
>
> Thanks!
> Andrew
>
>
>
> This stracktrace came about when I called
> val myRDD: RDD[(String,Int)] = ...
> myRDD.reduceByKey(_+_).top(100)
>
> But my toy example doesn't trigger the repro:
> sc.parallelize(Seq( ("A",10), ("B",5), ("A",4), ("C", 15)
> )).reduceByKey(_+_).top(2)
>
>
>
> 14/02/14 03:15:07 ERROR OneForOneStrategy:
> scala.collection.immutable.$colon$colon cannot be cast to
> org.apache.spark.util.BoundedPriorityQueue
> java.lang.ClassCastException: scala.collection.immutable.$colon$colon
> cannot be cast to org.apache.spark.util.BoundedPriorityQueue
>         at org.apache.spark.rdd.RDD$$anonfun$top$2.apply(RDD.scala:873)
>         at org.apache.spark.rdd.RDD$$anonfun$6.apply(RDD.scala:671)
>         at org.apache.spark.rdd.RDD$$anonfun$6.apply(RDD.scala:668)
>         at
> org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56)
>         at
>
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:859)
>         at
>
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:616)
>         at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>         at
>
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>



--

~Yours, Xuefeng Wu/吴雪峰  敬上
Reply | Threaded
Open this post in threaded view
|

Re: RDD.top() stacktrace

Andrew Ash
I'm sorry I do not yet have a consistent repro that doesn't use my internal
dataset.  That example is the repro I attempted to create but it does not
trigger the issue.  Hopefully I'll be able to get a consistent repro to
share.


On Fri, Feb 14, 2014 at 3:58 AM, Xuefeng Wu <[hidden email]> wrote:

> Hi Andrew,
>
> Sorry, I can not reproduce the issue by:
>
> scala> import org.apache.spark.rdd.RDD
>
> scala> val myRDD: RDD[(String,Int)] = sc.parallelize(Seq( ("A",10),
> ("B",5), ("A",4), ("C", 15)))
>
> scala> myRDD.reduceByKey(_+_).top(2)
>
> Any different compare with your example ?
>
>
>
> On Fri, Feb 14, 2014 at 7:38 PM, Andrew Ash <[hidden email]> wrote:
>
> > Spark 0.9.0
> >
> >
> > Hi Spark devs,
> >
> > I'm pretty sure this stacktrace is a bug in the way Spark is using the
> type
> > system but I don't quite know what it is.  Something to do with type
> bounds
> > judging from my Googling.
> >
> > Can someone with more Scala-foo than me please take a look?  In the
> > meantime I'll be avoiding top() for a bit.
> >
> > Thanks!
> > Andrew
> >
> >
> >
> > This stracktrace came about when I called
> > val myRDD: RDD[(String,Int)] = ...
> > myRDD.reduceByKey(_+_).top(100)
> >
> > But my toy example doesn't trigger the repro:
> > sc.parallelize(Seq( ("A",10), ("B",5), ("A",4), ("C", 15)
> > )).reduceByKey(_+_).top(2)
> >
> >
> >
> > 14/02/14 03:15:07 ERROR OneForOneStrategy:
> > scala.collection.immutable.$colon$colon cannot be cast to
> > org.apache.spark.util.BoundedPriorityQueue
> > java.lang.ClassCastException: scala.collection.immutable.$colon$colon
> > cannot be cast to org.apache.spark.util.BoundedPriorityQueue
> >         at org.apache.spark.rdd.RDD$$anonfun$top$2.apply(RDD.scala:873)
> >         at org.apache.spark.rdd.RDD$$anonfun$6.apply(RDD.scala:671)
> >         at org.apache.spark.rdd.RDD$$anonfun$6.apply(RDD.scala:668)
> >         at
> > org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56)
> >         at
> >
> >
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:859)
> >         at
> >
> >
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:616)
> >         at
> >
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
> >         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> >         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> >         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> >         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> >         at
> >
> >
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> >         at
> > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> >         at
> >
> >
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> >         at
> > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> >         at
> >
> >
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> >
>
>
>
> --
>
> ~Yours, Xuefeng Wu/吴雪峰  敬上
>