Scala 2.13 actual class used for Seq

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Scala 2.13 actual class used for Seq

Koert Kuipers
i have gotten used to spark always returning a WrappedArray for Seq. at some point i think i even read this was guaranteed to be the case. not sure if it still is...

in spark 3.0.1 with scala 2.12 i get a WrappedArray as expected:

scala> val x = Seq((1,2),(1,3)).toDF
x: org.apache.spark.sql.DataFrame = [_1: int, _2: int]

scala> x.groupBy("_1").agg(collect_list(col("_2")).as("_3")).withColumn("class_of_3", udf{ (s: Seq[Int]) => s.getClass.toString }.apply(col("_3"))).show(false)
+---+------+-------------------------------------------------+
|_1 |_3    |class_of_3                                       |
+---+------+-------------------------------------------------+
|1  |[2, 3]|class scala.collection.mutable.WrappedArray$ofRef|
+---+------+-------------------------------------------------+

but when i build current master with scala 2.13 i get:

scala> val x = Seq((1,2),(1,3)).toDF
warning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation' or `:replay -deprecation'
val x: org.apache.spark.sql.DataFrame = [_1: int, _2: int]

scala> x.groupBy("_1").agg(collect_list(col("_2")).as("_3")).withColumn("class", udf{ (s: Seq[Int]) => s.getClass.toString }.apply(col("_3"))).show(false)
+---+------+---------------------------------------------+
|_1 |_3    |class                                        |
+---+------+---------------------------------------------+
|1  |[2, 3]|class scala.collection.immutable.$colon$colon|
+---+------+---------------------------------------------+

i am curious if we are planning on returning immutable Seq going forward (which is nice)? and if so is List the best choice? i was sort of guessing it would be an immutable ArraySeq perhaps (given it provides efficient ways to wrap an array and access the underlying array)?

best
Reply | Threaded
Open this post in threaded view
|

Re: Scala 2.13 actual class used for Seq

Sean Owen-2
Scala 2.13 changed the typedef of Seq to an immutable.Seq, yes. So lots of things will now return an immutable Seq. Almost all code doesn't care what Seq it returns and we didn't change any of that in the code, so, this is just what we're getting as a 'default' from whatever operations produce the Seq. (But a user app expecting a Seq in 2.13 will still just work, as it will be expecting an immutable.Seq then)

You're right that many things don't necessarily return a WrappedArray anymore (I think that doesn't exist anymore in 2.13? ArraySeq now?) so user apps may need to change for 2.13, but, there are N things that any 2.13 app would have to change.

On Mon, Oct 19, 2020 at 12:29 AM Koert Kuipers <[hidden email]> wrote:
i have gotten used to spark always returning a WrappedArray for Seq. at some point i think i even read this was guaranteed to be the case. not sure if it still is...

in spark 3.0.1 with scala 2.12 i get a WrappedArray as expected:

scala> val x = Seq((1,2),(1,3)).toDF
x: org.apache.spark.sql.DataFrame = [_1: int, _2: int]

scala> x.groupBy("_1").agg(collect_list(col("_2")).as("_3")).withColumn("class_of_3", udf{ (s: Seq[Int]) => s.getClass.toString }.apply(col("_3"))).show(false)
+---+------+-------------------------------------------------+
|_1 |_3    |class_of_3                                       |
+---+------+-------------------------------------------------+
|1  |[2, 3]|class scala.collection.mutable.WrappedArray$ofRef|
+---+------+-------------------------------------------------+

but when i build current master with scala 2.13 i get:

scala> val x = Seq((1,2),(1,3)).toDF
warning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation' or `:replay -deprecation'
val x: org.apache.spark.sql.DataFrame = [_1: int, _2: int]

scala> x.groupBy("_1").agg(collect_list(col("_2")).as("_3")).withColumn("class", udf{ (s: Seq[Int]) => s.getClass.toString }.apply(col("_3"))).show(false)
+---+------+---------------------------------------------+
|_1 |_3    |class                                        |
+---+------+---------------------------------------------+
|1  |[2, 3]|class scala.collection.immutable.$colon$colon|
+---+------+---------------------------------------------+

i am curious if we are planning on returning immutable Seq going forward (which is nice)? and if so is List the best choice? i was sort of guessing it would be an immutable ArraySeq perhaps (given it provides efficient ways to wrap an array and access the underlying array)?

best
Reply | Threaded
Open this post in threaded view
|

Re: Scala 2.13 actual class used for Seq

Koert Kuipers
i rebuild master for Spark 2.12 and i see it also uses List instead of WrappedArray. so the change is in master (compared to 3.0.1) and it is not limited to Scala 2.13.
this might impact user programs somewhat? List has different performance characteristics than WrappedArray... for starters it is not an IndexedSeq.


On Mon, Oct 19, 2020 at 8:24 AM Sean Owen <[hidden email]> wrote:
Scala 2.13 changed the typedef of Seq to an immutable.Seq, yes. So lots of things will now return an immutable Seq. Almost all code doesn't care what Seq it returns and we didn't change any of that in the code, so, this is just what we're getting as a 'default' from whatever operations produce the Seq. (But a user app expecting a Seq in 2.13 will still just work, as it will be expecting an immutable.Seq then)

You're right that many things don't necessarily return a WrappedArray anymore (I think that doesn't exist anymore in 2.13? ArraySeq now?) so user apps may need to change for 2.13, but, there are N things that any 2.13 app would have to change.

On Mon, Oct 19, 2020 at 12:29 AM Koert Kuipers <[hidden email]> wrote:
i have gotten used to spark always returning a WrappedArray for Seq. at some point i think i even read this was guaranteed to be the case. not sure if it still is...

in spark 3.0.1 with scala 2.12 i get a WrappedArray as expected:

scala> val x = Seq((1,2),(1,3)).toDF
x: org.apache.spark.sql.DataFrame = [_1: int, _2: int]

scala> x.groupBy("_1").agg(collect_list(col("_2")).as("_3")).withColumn("class_of_3", udf{ (s: Seq[Int]) => s.getClass.toString }.apply(col("_3"))).show(false)
+---+------+-------------------------------------------------+
|_1 |_3    |class_of_3                                       |
+---+------+-------------------------------------------------+
|1  |[2, 3]|class scala.collection.mutable.WrappedArray$ofRef|
+---+------+-------------------------------------------------+

but when i build current master with scala 2.13 i get:

scala> val x = Seq((1,2),(1,3)).toDF
warning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation' or `:replay -deprecation'
val x: org.apache.spark.sql.DataFrame = [_1: int, _2: int]

scala> x.groupBy("_1").agg(collect_list(col("_2")).as("_3")).withColumn("class", udf{ (s: Seq[Int]) => s.getClass.toString }.apply(col("_3"))).show(false)
+---+------+---------------------------------------------+
|_1 |_3    |class                                        |
+---+------+---------------------------------------------+
|1  |[2, 3]|class scala.collection.immutable.$colon$colon|
+---+------+---------------------------------------------+

i am curious if we are planning on returning immutable Seq going forward (which is nice)? and if so is List the best choice? i was sort of guessing it would be an immutable ArraySeq perhaps (given it provides efficient ways to wrap an array and access the underlying array)?

best
Reply | Threaded
Open this post in threaded view
|

Re: Scala 2.13 actual class used for Seq

Sean Owen-2
It's possible the changes do change the concrete return type in 2.12 too, though no API interface types should change. I recall that because 2.13 makes WrappedArray a typedef (not gone, actually) I believe some code had to change that expected it, to make it work on 2.12 and 2.13. Apps shouldn't depend on the concrete implementation of course, but yes that could be an issue if some code is expecting a particular collection class.

On Mon, Oct 19, 2020 at 11:17 AM Koert Kuipers <[hidden email]> wrote:
i rebuild master for Spark 2.12 and i see it also uses List instead of WrappedArray. so the change is in master (compared to 3.0.1) and it is not limited to Scala 2.13.
this might impact user programs somewhat? List has different performance characteristics than WrappedArray... for starters it is not an IndexedSeq.


On Mon, Oct 19, 2020 at 8:24 AM Sean Owen <[hidden email]> wrote:
Scala 2.13 changed the typedef of Seq to an immutable.Seq, yes. So lots of things will now return an immutable Seq. Almost all code doesn't care what Seq it returns and we didn't change any of that in the code, so, this is just what we're getting as a 'default' from whatever operations produce the Seq. (But a user app expecting a Seq in 2.13 will still just work, as it will be expecting an immutable.Seq then)

You're right that many things don't necessarily return a WrappedArray anymore (I think that doesn't exist anymore in 2.13? ArraySeq now?) so user apps may need to change for 2.13, but, there are N things that any 2.13 app would have to change.

On Mon, Oct 19, 2020 at 12:29 AM Koert Kuipers <[hidden email]> wrote:
i have gotten used to spark always returning a WrappedArray for Seq. at some point i think i even read this was guaranteed to be the case. not sure if it still is...

in spark 3.0.1 with scala 2.12 i get a WrappedArray as expected:

scala> val x = Seq((1,2),(1,3)).toDF
x: org.apache.spark.sql.DataFrame = [_1: int, _2: int]

scala> x.groupBy("_1").agg(collect_list(col("_2")).as("_3")).withColumn("class_of_3", udf{ (s: Seq[Int]) => s.getClass.toString }.apply(col("_3"))).show(false)
+---+------+-------------------------------------------------+
|_1 |_3    |class_of_3                                       |
+---+------+-------------------------------------------------+
|1  |[2, 3]|class scala.collection.mutable.WrappedArray$ofRef|
+---+------+-------------------------------------------------+

but when i build current master with scala 2.13 i get:

scala> val x = Seq((1,2),(1,3)).toDF
warning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation' or `:replay -deprecation'
val x: org.apache.spark.sql.DataFrame = [_1: int, _2: int]

scala> x.groupBy("_1").agg(collect_list(col("_2")).as("_3")).withColumn("class", udf{ (s: Seq[Int]) => s.getClass.toString }.apply(col("_3"))).show(false)
+---+------+---------------------------------------------+
|_1 |_3    |class                                        |
+---+------+---------------------------------------------+
|1  |[2, 3]|class scala.collection.immutable.$colon$colon|
+---+------+---------------------------------------------+

i am curious if we are planning on returning immutable Seq going forward (which is nice)? and if so is List the best choice? i was sort of guessing it would be an immutable ArraySeq perhaps (given it provides efficient ways to wrap an array and access the underlying array)?

best