Scala 3 support approach

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Scala 3 support approach

gemelen
Hi all!

I'd like to ask for an opinion and discuss the next thing:
at this moment in general Spark could be built with Scala 2.11 and 2.12 (mostly), and close to the point to have support for Scala 2.13. On the other hand, Scala 3 is going into the pre-release phase (with 3.0.0-M1 released at the beginning of October).

Previously, support of the current Scala version by Spark was a bit behind of desired state, dictated by all circumstances. To move things differently with Scala 3 I'd like to contribute my efforts (and help others if there would be any) to support it starting as soon as possible (ie to have Spark build compiled with Scala 3 and to have release artifacts when it would be possible).

I suggest that it would require to add an experimental profile to the build file so further changes to compile, test and run other tasks could be done in incremental manner (with respect to compatibility with current code for versions 2.12 and 2.13 and backporting where possible). I'd like to do it that way since I do not represent any company, contribute in my own time and thus cannot guarantee consistent time spent on this (so just in case of anything such contribution would not be left in the fork repo).

In fact, with recent changes to move Spark build to use the latest SBT, such starting changes are pretty small on the SBT side (about 10 LOC) and I was already able to see how build fails with Scala 3 compiler :)

To summarize:
1. Is this approach suitable for the project at this moment, so it would be accepted and accounted for in the release schedule (in 2021 I assume)?
2. how should it be filed, as an umbrella Jira ticket with minor tasks or as a SPIP at first with more thorough analysis?
Reply | Threaded
Open this post in threaded view
|

Re: Scala 3 support approach

gemelen
Sorry for the noise.
Please reply to this one.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Scala 3 support approach

Sean Owen-3
In reply to this post by gemelen
Spark depends on a number of Scala libraries, so needs them all to support version X before Spark can. This only happened for 2.13 about 4-5 months ago. I wonder if even a fraction of the necessary libraries have 3.0 support yet?

It can be difficult to test and support multiple Scala versions simultaneously. 2.11 has already been dropped and 2.13 is coming, but it might be hard to have a code base that works for 2.12, 2.13, and 3.0.

So one dependency could be, when can 2.12 be dropped? And with Spark supporting 2.13 only early next year, and user apps migrating over a year or more, it seems difficult to do that anytime soon.

I think Spark 3 support is eventually desirable, so maybe the other way to resolve that is to show that Spark 3 support doesn't interfere much with maintenance of 2.12/2.13 support. I am a little bit skeptical of it, just because the 2.11->2.12 and 2.12->2.13 changes were fairly significant, let alone 2.13->3.0 I'm sure, but I don't know.

That is, if we start to have to implement workarounds are parallel code trees and so on for 3.0 support, and if it can't be completed for a while to come because of downstream dependencies, then it may not be worth iterating in the code base yet or even considering.

You can file an umbrella JIRA to track it, yes, with a possible target of Spark 4.0. Non-intrusive changes can go in anytime. We may not want to get into major ones until later.

On Sat, Oct 17, 2020 at 8:49 PM gemelen <[hidden email]> wrote:
Hi all!

I'd like to ask for an opinion and discuss the next thing:
at this moment in general Spark could be built with Scala 2.11 and 2.12 (mostly), and close to the point to have support for Scala 2.13. On the other hand, Scala 3 is going into the pre-release phase (with 3.0.0-M1 released at the beginning of October).

Previously, support of the current Scala version by Spark was a bit behind of desired state, dictated by all circumstances. To move things differently with Scala 3 I'd like to contribute my efforts (and help others if there would be any) to support it starting as soon as possible (ie to have Spark build compiled with Scala 3 and to have release artifacts when it would be possible).

I suggest that it would require to add an experimental profile to the build file so further changes to compile, test and run other tasks could be done in incremental manner (with respect to compatibility with current code for versions 2.12 and 2.13 and backporting where possible). I'd like to do it that way since I do not represent any company, contribute in my own time and thus cannot guarantee consistent time spent on this (so just in case of anything such contribution would not be left in the fork repo).

In fact, with recent changes to move Spark build to use the latest SBT, such starting changes are pretty small on the SBT side (about 10 LOC) and I was already able to see how build fails with Scala 3 compiler :)

To summarize:
1. Is this approach suitable for the project at this moment, so it would be accepted and accounted for in the release schedule (in 2021 I assume)?
2. how should it be filed, as an umbrella Jira ticket with minor tasks or as a SPIP at first with more thorough analysis?
Reply | Threaded
Open this post in threaded view
|

Re: Scala 3 support approach

gemelen
Thanks for the input, Sean.

> Spark depends on a number of Scala libraries, so needs them all to support
> version X before Spark can. This only happened for 2.13 about 4-5 months
> ago. I wonder if even a fraction of the necessary libraries have 3.0
> support yet?

As far as I understand, here shows off one of the differences, we could use
Scala 2.13 binary dependencies in Scala 3 project: it was expected that all
the 3d-party libraries would lag behind compiler and standard library for
significant amount of time, so it is "retro-compatible" in their terms
[1][2].
And that's why I was able to see actual compilation errors using all
dependency set from Spark with Scala 2.13

[1]
https://contributors.scala-lang.org/t/scala-2-to-3-transition-some-updates-from-the-scala-center/4013
[2]
https://scalacenter.github.io/scala-3-migration-guide/docs/compatibility.html#a-scala-3-module-depending-on-a-scala-2-artifact

I believe this changes the perspective a bit: that only code changes are
left to consider on how much to divert from current state of the sources and
is it possible to maintain it in the same source tree.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Scala 3 support approach

Dongjoon Hyun-2
Hi, Denis

We are currently moving toward Scala 3 together by focusing on completion SPARK-25075 first as a stepping stone.

    https://issues.apache.org/jira/browse/SPARK-25075
    Build and test Spark against Scala 2.13

We didn't finish it yet. We need to have Jenkins jobs with Scala 2.13.
Also, we should complete PySpark/SparkR and all IT testing on Scala 2.13-built distribution.

In addition, although we support Scala 2.12 at Apache Spark 3.1.0, we couldn't use Scala 2.12.12 due to its regression.


Given the above, although I'm happy with the recent progress in this area, we definitely need more efforts and time, too.

I'd like to discuss this issue after SPARK-25075 completion.
At that time, the gap between AS-IS and TO-BE will be smaller than now.

Bests,
Dongjoon.



On Sun, Oct 18, 2020 at 7:11 AM gemelen <[hidden email]> wrote:
Thanks for the input, Sean.

> Spark depends on a number of Scala libraries, so needs them all to support
> version X before Spark can. This only happened for 2.13 about 4-5 months
> ago. I wonder if even a fraction of the necessary libraries have 3.0
> support yet?

As far as I understand, here shows off one of the differences, we could use
Scala 2.13 binary dependencies in Scala 3 project: it was expected that all
the 3d-party libraries would lag behind compiler and standard library for
significant amount of time, so it is "retro-compatible" in their terms
[1][2].
And that's why I was able to see actual compilation errors using all
dependency set from Spark with Scala 2.13

[1]
https://contributors.scala-lang.org/t/scala-2-to-3-transition-some-updates-from-the-scala-center/4013
[2]
https://scalacenter.github.io/scala-3-migration-guide/docs/compatibility.html#a-scala-3-module-depending-on-a-scala-2-artifact

I believe this changes the perspective a bit: that only code changes are
left to consider on how much to divert from current state of the sources and
is it possible to maintain it in the same source tree.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Scala 3 support approach

Koert Kuipers
In reply to this post by Sean Owen-3
i think scala 3.0 will be able to use libraries built with Scala 2.13 (as long as they dont use macros)

see:

On Sun, Oct 18, 2020 at 9:54 AM Sean Owen <[hidden email]> wrote:
Spark depends on a number of Scala libraries, so needs them all to support version X before Spark can. This only happened for 2.13 about 4-5 months ago. I wonder if even a fraction of the necessary libraries have 3.0 support yet?

It can be difficult to test and support multiple Scala versions simultaneously. 2.11 has already been dropped and 2.13 is coming, but it might be hard to have a code base that works for 2.12, 2.13, and 3.0.

So one dependency could be, when can 2.12 be dropped? And with Spark supporting 2.13 only early next year, and user apps migrating over a year or more, it seems difficult to do that anytime soon.

I think Spark 3 support is eventually desirable, so maybe the other way to resolve that is to show that Spark 3 support doesn't interfere much with maintenance of 2.12/2.13 support. I am a little bit skeptical of it, just because the 2.11->2.12 and 2.12->2.13 changes were fairly significant, let alone 2.13->3.0 I'm sure, but I don't know.

That is, if we start to have to implement workarounds are parallel code trees and so on for 3.0 support, and if it can't be completed for a while to come because of downstream dependencies, then it may not be worth iterating in the code base yet or even considering.

You can file an umbrella JIRA to track it, yes, with a possible target of Spark 4.0. Non-intrusive changes can go in anytime. We may not want to get into major ones until later.

On Sat, Oct 17, 2020 at 8:49 PM gemelen <[hidden email]> wrote:
Hi all!

I'd like to ask for an opinion and discuss the next thing:
at this moment in general Spark could be built with Scala 2.11 and 2.12 (mostly), and close to the point to have support for Scala 2.13. On the other hand, Scala 3 is going into the pre-release phase (with 3.0.0-M1 released at the beginning of October).

Previously, support of the current Scala version by Spark was a bit behind of desired state, dictated by all circumstances. To move things differently with Scala 3 I'd like to contribute my efforts (and help others if there would be any) to support it starting as soon as possible (ie to have Spark build compiled with Scala 3 and to have release artifacts when it would be possible).

I suggest that it would require to add an experimental profile to the build file so further changes to compile, test and run other tasks could be done in incremental manner (with respect to compatibility with current code for versions 2.12 and 2.13 and backporting where possible). I'd like to do it that way since I do not represent any company, contribute in my own time and thus cannot guarantee consistent time spent on this (so just in case of anything such contribution would not be left in the fork repo).

In fact, with recent changes to move Spark build to use the latest SBT, such starting changes are pretty small on the SBT side (about 10 LOC) and I was already able to see how build fails with Scala 3 compiler :)

To summarize:
1. Is this approach suitable for the project at this moment, so it would be accepted and accounted for in the release schedule (in 2021 I assume)?
2. how should it be filed, as an umbrella Jira ticket with minor tasks or as a SPIP at first with more thorough analysis?
Reply | Threaded
Open this post in threaded view
|

Re: Scala 3 support approach

Dongjoon Hyun-2
Hi, Koert.

We know, welcome, and believe it. However, it's only Scala community's roadmap so far. It doesn't mean Apache Spark supports Scala 3 officially.

For example, Apache Spark 3.0.1 supports Scala 2.12.10 but not 2.12.12 due to Scala issue.

In Apache Spark community, we had better focus on 2.13. After that, we will see what is needed for Scala 3.

Bests,
Dongjoon.

On Sun, Oct 18, 2020 at 1:33 PM Koert Kuipers <[hidden email]> wrote:
i think scala 3.0 will be able to use libraries built with Scala 2.13 (as long as they dont use macros)

see:

On Sun, Oct 18, 2020 at 9:54 AM Sean Owen <[hidden email]> wrote:
Spark depends on a number of Scala libraries, so needs them all to support version X before Spark can. This only happened for 2.13 about 4-5 months ago. I wonder if even a fraction of the necessary libraries have 3.0 support yet?

It can be difficult to test and support multiple Scala versions simultaneously. 2.11 has already been dropped and 2.13 is coming, but it might be hard to have a code base that works for 2.12, 2.13, and 3.0.

So one dependency could be, when can 2.12 be dropped? And with Spark supporting 2.13 only early next year, and user apps migrating over a year or more, it seems difficult to do that anytime soon.

I think Spark 3 support is eventually desirable, so maybe the other way to resolve that is to show that Spark 3 support doesn't interfere much with maintenance of 2.12/2.13 support. I am a little bit skeptical of it, just because the 2.11->2.12 and 2.12->2.13 changes were fairly significant, let alone 2.13->3.0 I'm sure, but I don't know.

That is, if we start to have to implement workarounds are parallel code trees and so on for 3.0 support, and if it can't be completed for a while to come because of downstream dependencies, then it may not be worth iterating in the code base yet or even considering.

You can file an umbrella JIRA to track it, yes, with a possible target of Spark 4.0. Non-intrusive changes can go in anytime. We may not want to get into major ones until later.

On Sat, Oct 17, 2020 at 8:49 PM gemelen <[hidden email]> wrote:
Hi all!

I'd like to ask for an opinion and discuss the next thing:
at this moment in general Spark could be built with Scala 2.11 and 2.12 (mostly), and close to the point to have support for Scala 2.13. On the other hand, Scala 3 is going into the pre-release phase (with 3.0.0-M1 released at the beginning of October).

Previously, support of the current Scala version by Spark was a bit behind of desired state, dictated by all circumstances. To move things differently with Scala 3 I'd like to contribute my efforts (and help others if there would be any) to support it starting as soon as possible (ie to have Spark build compiled with Scala 3 and to have release artifacts when it would be possible).

I suggest that it would require to add an experimental profile to the build file so further changes to compile, test and run other tasks could be done in incremental manner (with respect to compatibility with current code for versions 2.12 and 2.13 and backporting where possible). I'd like to do it that way since I do not represent any company, contribute in my own time and thus cannot guarantee consistent time spent on this (so just in case of anything such contribution would not be left in the fork repo).

In fact, with recent changes to move Spark build to use the latest SBT, such starting changes are pretty small on the SBT side (about 10 LOC) and I was already able to see how build fails with Scala 3 compiler :)

To summarize:
1. Is this approach suitable for the project at this moment, so it would be accepted and accounted for in the release schedule (in 2021 I assume)?
2. how should it be filed, as an umbrella Jira ticket with minor tasks or as a SPIP at first with more thorough analysis?