Straw poll: dropping support for things like Scala 2.10

classic Classic list List threaded Threaded
39 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Straw poll: dropping support for things like Scala 2.10

Steve Loughran

On 27 Oct 2016, at 10:03, Sean Owen <[hidden email]> wrote:

Seems OK by me.
How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like to add that to a list of things that will begin to be unsupported 6 months from now.


If you go to java 8 only, then hadoop 2.6+ is mandatory. 


On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <[hidden email]> wrote:
that sounds good to me

On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <[hidden email]> wrote:
We can do the following concrete proposal:

1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr 2017).

2. In Spark 2.1.0 release, aggressively and explicitly announce the deprecation of Java 7 / Scala 2.10 support.

(a) It should appear in release notes, documentations that mention how to build Spark

(b) and a warning should be shown every time SparkContext is started using Scala 2.10 or Java 7.


Reply | Threaded
Open this post in threaded view
|

Re: Straw poll: dropping support for things like Scala 2.10

rxin
I created a JIRA ticket to track this: https://issues.apache.org/jira/browse/SPARK-18138



On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran <[hidden email]> wrote:

On 27 Oct 2016, at 10:03, Sean Owen <[hidden email]> wrote:

Seems OK by me.
How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like to add that to a list of things that will begin to be unsupported 6 months from now.


If you go to java 8 only, then hadoop 2.6+ is mandatory. 


On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <[hidden email]> wrote:
that sounds good to me

On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <[hidden email]> wrote:
We can do the following concrete proposal:

1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr 2017).

2. In Spark 2.1.0 release, aggressively and explicitly announce the deprecation of Java 7 / Scala 2.10 support.

(a) It should appear in release notes, documentations that mention how to build Spark

(b) and a warning should be shown every time SparkContext is started using Scala 2.10 or Java 7.



Reply | Threaded
Open this post in threaded view
|

Re: Straw poll: dropping support for things like Scala 2.10

Yanbo Liang-2
+1

On Thu, Oct 27, 2016 at 3:15 AM, Reynold Xin <[hidden email]> wrote:
I created a JIRA ticket to track this: https://issues.apache.org/jira/browse/SPARK-18138



On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran <[hidden email]> wrote:

On 27 Oct 2016, at 10:03, Sean Owen <[hidden email]> wrote:

Seems OK by me.
How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like to add that to a list of things that will begin to be unsupported 6 months from now.


If you go to java 8 only, then hadoop 2.6+ is mandatory. 


On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <[hidden email]> wrote:
that sounds good to me

On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <[hidden email]> wrote:
We can do the following concrete proposal:

1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr 2017).

2. In Spark 2.1.0 release, aggressively and explicitly announce the deprecation of Java 7 / Scala 2.10 support.

(a) It should appear in release notes, documentations that mention how to build Spark

(b) and a warning should be shown every time SparkContext is started using Scala 2.10 or Java 7.




Reply | Threaded
Open this post in threaded view
|

Re: Straw poll: dropping support for things like Scala 2.10

Matei Zaharia
Administrator
In reply to this post by rxin
Just to comment on this, I'm generally against removing these types of things unless they create a substantial burden on project contributors. It doesn't sound like Python 2.6 and Java 7 do that yet -- Scala 2.10 might, but then of course we need to wait for 2.12 to be out and stable.

In general, this type of stuff only hurts users, and doesn't have a huge impact on Spark contributors' productivity (sure, it's a bit unpleasant, but that's life). If we break compatibility this way too quickly, we fragment the user community, and then either people have a crappy experience with Spark because their corporate IT doesn't yet have an environment that can run the latest version, or worse, they create more maintenance burden for us because they ask for more patches to be backported to old Spark versions (1.6.x, 2.0.x, etc). Python in particular is pretty fundamental to many Linux distros.

In the future, rather than just looking at when some software came out, it may be good to have some criteria for when to drop support for something. For example, if there are really nice libraries in Python 2.7 or Java 8 that we're missing out on, that may be a good reason. The maintenance burden for multiple Scala versions is definitely painful but I also think we should always support the latest two Scala releases.

Matei

On Oct 27, 2016, at 12:15 PM, Reynold Xin <[hidden email]> wrote:

I created a JIRA ticket to track this: https://issues.apache.org/jira/browse/SPARK-18138



On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran <[hidden email]> wrote:

On 27 Oct 2016, at 10:03, Sean Owen <[hidden email]> wrote:

Seems OK by me.
How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like to add that to a list of things that will begin to be unsupported 6 months from now.


If you go to java 8 only, then hadoop 2.6+ is mandatory. 


On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <[hidden email]> wrote:
that sounds good to me

On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <[hidden email]> wrote:
We can do the following concrete proposal:

1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr 2017).

2. In Spark 2.1.0 release, aggressively and explicitly announce the deprecation of Java 7 / Scala 2.10 support.

(a) It should appear in release notes, documentations that mention how to build Spark

(b) and a warning should be shown every time SparkContext is started using Scala 2.10 or Java 7.




Reply | Threaded
Open this post in threaded view
|

Re: Straw poll: dropping support for things like Scala 2.10

Amit Tank
+1 for Matei's point. 

On Thursday, October 27, 2016, Matei Zaharia <[hidden email]> wrote:
Just to comment on this, I'm generally against removing these types of things unless they create a substantial burden on project contributors. It doesn't sound like Python 2.6 and Java 7 do that yet -- Scala 2.10 might, but then of course we need to wait for 2.12 to be out and stable.

In general, this type of stuff only hurts users, and doesn't have a huge impact on Spark contributors' productivity (sure, it's a bit unpleasant, but that's life). If we break compatibility this way too quickly, we fragment the user community, and then either people have a crappy experience with Spark because their corporate IT doesn't yet have an environment that can run the latest version, or worse, they create more maintenance burden for us because they ask for more patches to be backported to old Spark versions (1.6.x, 2.0.x, etc). Python in particular is pretty fundamental to many Linux distros.

In the future, rather than just looking at when some software came out, it may be good to have some criteria for when to drop support for something. For example, if there are really nice libraries in Python 2.7 or Java 8 that we're missing out on, that may be a good reason. The maintenance burden for multiple Scala versions is definitely painful but I also think we should always support the latest two Scala releases.

Matei

On Oct 27, 2016, at 12:15 PM, Reynold Xin <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;rxin@databricks.com&#39;);" target="_blank">rxin@...> wrote:

I created a JIRA ticket to track this: https://issues.apache.org/jira/browse/SPARK-18138



On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;stevel@hortonworks.com&#39;);" target="_blank">stevel@...> wrote:

On 27 Oct 2016, at 10:03, Sean Owen <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;sowen@cloudera.com&#39;);" target="_blank">sowen@...> wrote:

Seems OK by me.
How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like to add that to a list of things that will begin to be unsupported 6 months from now.


If you go to java 8 only, then hadoop 2.6+ is mandatory. 


On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;koert@tresata.com&#39;);" target="_blank">koert@...> wrote:
that sounds good to me

On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;rxin@databricks.com&#39;);" target="_blank">rxin@...> wrote:
We can do the following concrete proposal:

1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr 2017).

2. In Spark 2.1.0 release, aggressively and explicitly announce the deprecation of Java 7 / Scala 2.10 support.

(a) It should appear in release notes, documentations that mention how to build Spark

(b) and a warning should be shown every time SparkContext is started using Scala 2.10 or Java 7.




Reply | Threaded
Open this post in threaded view
|

Re: Straw poll: dropping support for things like Scala 2.10

Davies Liu
In reply to this post by Matei Zaharia
+1 for Matei's point.

On Thu, Oct 27, 2016 at 8:36 AM, Matei Zaharia <[hidden email]> wrote:

> Just to comment on this, I'm generally against removing these types of
> things unless they create a substantial burden on project contributors. It
> doesn't sound like Python 2.6 and Java 7 do that yet -- Scala 2.10 might,
> but then of course we need to wait for 2.12 to be out and stable.
>
> In general, this type of stuff only hurts users, and doesn't have a huge
> impact on Spark contributors' productivity (sure, it's a bit unpleasant, but
> that's life). If we break compatibility this way too quickly, we fragment
> the user community, and then either people have a crappy experience with
> Spark because their corporate IT doesn't yet have an environment that can
> run the latest version, or worse, they create more maintenance burden for us
> because they ask for more patches to be backported to old Spark versions
> (1.6.x, 2.0.x, etc). Python in particular is pretty fundamental to many
> Linux distros.
>
> In the future, rather than just looking at when some software came out, it
> may be good to have some criteria for when to drop support for something.
> For example, if there are really nice libraries in Python 2.7 or Java 8 that
> we're missing out on, that may be a good reason. The maintenance burden for
> multiple Scala versions is definitely painful but I also think we should
> always support the latest two Scala releases.
>
> Matei
>
> On Oct 27, 2016, at 12:15 PM, Reynold Xin <[hidden email]> wrote:
>
> I created a JIRA ticket to track this:
> https://issues.apache.org/jira/browse/SPARK-18138
>
>
>
> On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran <[hidden email]>
> wrote:
>>
>>
>> On 27 Oct 2016, at 10:03, Sean Owen <[hidden email]> wrote:
>>
>> Seems OK by me.
>> How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like
>> to add that to a list of things that will begin to be unsupported 6 months
>> from now.
>>
>>
>> If you go to java 8 only, then hadoop 2.6+ is mandatory.
>>
>>
>> On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <[hidden email]> wrote:
>>>
>>> that sounds good to me
>>>
>>> On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <[hidden email]> wrote:
>>>>
>>>> We can do the following concrete proposal:
>>>>
>>>> 1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0
>>>> (Mar/Apr 2017).
>>>>
>>>> 2. In Spark 2.1.0 release, aggressively and explicitly announce the
>>>> deprecation of Java 7 / Scala 2.10 support.
>>>>
>>>> (a) It should appear in release notes, documentations that mention how
>>>> to build Spark
>>>>
>>>> (b) and a warning should be shown every time SparkContext is started
>>>> using Scala 2.10 or Java 7.
>>>>
>>
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Straw poll: dropping support for things like Scala 2.10

Felix Cheung
+1 on Matei's.


_____________________________
From: Davies Liu <[hidden email]>
Sent: Thursday, October 27, 2016 9:58 AM
Subject: Re: Straw poll: dropping support for things like Scala 2.10
To: Matei Zaharia <[hidden email]>
Cc: Reynold Xin <[hidden email]>, Steve Loughran <[hidden email]>, Sean Owen <[hidden email]>, Koert Kuipers <[hidden email]>, Dongjoon Hyun <[hidden email]>, Daniel Siegmann <[hidden email]>, Apache Spark Dev <[hidden email]>


+1 for Matei's point.

On Thu, Oct 27, 2016 at 8:36 AM, Matei Zaharia <[hidden email]> wrote:
> Just to comment on this, I'm generally against removing these types of
> things unless they create a substantial burden on project contributors. It
> doesn't sound like Python 2.6 and Java 7 do that yet -- Scala 2.10 might,
> but then of course we need to wait for 2.12 to be out and stable.
>
> In general, this type of stuff only hurts users, and doesn't have a huge
> impact on Spark contributors' productivity (sure, it's a bit unpleasant, but
> that's life). If we break compatibility this way too quickly, we fragment
> the user community, and then either people have a crappy experience with
> Spark because their corporate IT doesn't yet have an environment that can
> run the latest version, or worse, they create more maintenance burden for us
> because they ask for more patches to be backported to old Spark versions
> (1.6.x, 2.0.x, etc). Python in particular is pretty fundamental to many
> Linux distros.
>
> In the future, rather than just looking at when some software came out, it
> may be good to have some criteria for when to drop support for something.
> For example, if there are really nice libraries in Python 2.7 or Java 8 that
> we're missing out on, that may be a good reason. The maintenance burden for
> multiple Scala versions is definitely painful but I also think we should
> always support the latest two Scala releases.
>
> Matei
>
> On Oct 27, 2016, at 12:15 PM, Reynold Xin <[hidden email]> wrote:
>
> I created a JIRA ticket to track this:
> https://issues.apache.org/jira/browse/SPARK-18138
>
>
>
> On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran <[hidden email]>
> wrote:
>>
>>
>> On 27 Oct 2016, at 10:03, Sean Owen <[hidden email]> wrote:
>>
>> Seems OK by me.
>> How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like
>> to add that to a list of things that will begin to be unsupported 6 months
>> from now.
>>
>>
>> If you go to java 8 only, then hadoop 2.6+ is mandatory.
>>
>>
>> On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <[hidden email]> wrote:
>>>
>>> that sounds good to me
>>>
>>> On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <[hidden email]> wrote:
>>>>
>>>> We can do the following concrete proposal:
>>>>
>>>> 1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0
>>>> (Mar/Apr 2017).
>>>>
>>>> 2. In Spark 2.1.0 release, aggressively and explicitly announce the
>>>> deprecation of Java 7 / Scala 2.10 support.
>>>>
>>>> (a) It should appear in release notes, documentations that mention how
>>>> to build Spark
>>>>
>>>> (b) and a warning should be shown every time SparkContext is started
>>>> using Scala 2.10 or Java 7.
>>>>
>>
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



Reply | Threaded
Open this post in threaded view
|

Re: Straw poll: dropping support for things like Scala 2.10

Sean Owen
In reply to this post by Matei Zaharia
The burden may be a little more apparent when dealing with the day to day merging and fixing of breaks. The upside is maybe the more compelling argument though. For example, lambda-fying all the Java code, supporting java.time, and taking advantage of some newer Hadoop/YARN APIs is a moderate win for users too, and there's also a cost to not doing that.

I must say I don't see a risk of fragmentation as nearly the problem it's made out to be here. We are, after all, here discussing _beginning_ to remove support _in 6 months_, for long since non-current versions of things. An org's decision to not, say, use Java 8 is a decision to not use the new version of lots of things. It's not clear this is a constituency that is either large or one to reasonably serve indefinitely.

In the end, the Scala issue may be decisive. Supporting 2.10 - 2.12 simultaneously is a bridge too far, and if 2.12 requires Java 8, it's a good reason to for Spark to require Java 8. And Steve suggests that means a minimum of Hadoop 2.6 too. (I still profess ignorance of the Python part of the issue.)

Put another way I am not sure what the criteria is, if not the above?

I support deprecating all of these things, at the least, in 2.1.0. Although it's a separate question, I believe it's going to be necessary to remove support in ~6 months in 2.2.0.


On Thu, Oct 27, 2016 at 4:36 PM Matei Zaharia <[hidden email]> wrote:
Just to comment on this, I'm generally against removing these types of things unless they create a substantial burden on project contributors. It doesn't sound like Python 2.6 and Java 7 do that yet -- Scala 2.10 might, but then of course we need to wait for 2.12 to be out and stable.

In general, this type of stuff only hurts users, and doesn't have a huge impact on Spark contributors' productivity (sure, it's a bit unpleasant, but that's life). If we break compatibility this way too quickly, we fragment the user community, and then either people have a crappy experience with Spark because their corporate IT doesn't yet have an environment that can run the latest version, or worse, they create more maintenance burden for us because they ask for more patches to be backported to old Spark versions (1.6.x, 2.0.x, etc). Python in particular is pretty fundamental to many Linux distros.

In the future, rather than just looking at when some software came out, it may be good to have some criteria for when to drop support for something. For example, if there are really nice libraries in Python 2.7 or Java 8 that we're missing out on, that may be a good reason. The maintenance burden for multiple Scala versions is definitely painful but I also think we should always support the latest two Scala releases.

Matei

On Oct 27, 2016, at 12:15 PM, Reynold Xin <[hidden email]> wrote:

I created a JIRA ticket to track this: https://issues.apache.org/jira/browse/SPARK-18138



On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran <[hidden email]> wrote:

On 27 Oct 2016, at 10:03, Sean Owen <[hidden email]> wrote:

Seems OK by me.
How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like to add that to a list of things that will begin to be unsupported 6 months from now.


If you go to java 8 only, then hadoop 2.6+ is mandatory. 


On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <[hidden email]> wrote:
that sounds good to me

On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <[hidden email]> wrote:
We can do the following concrete proposal:

1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr 2017).

2. In Spark 2.1.0 release, aggressively and explicitly announce the deprecation of Java 7 / Scala 2.10 support.

(a) It should appear in release notes, documentations that mention how to build Spark

(b) and a warning should be shown every time SparkContext is started using Scala 2.10 or Java 7.




Reply | Threaded
Open this post in threaded view
|

Re: Straw poll: dropping support for things like Scala 2.10

Ofir Manor
I totally agree with Sean, just a small correction:
Java 7 and Python 2.6 are already deprecated since Spark 2.0 (after a lengthy discussion), so there is no need to discuss whether they should become deprecated in 2.1
The discussion is whether Scala 2.10 should also be marked as deprecated (no one is objecting that), and more importantly, when to actually move from deprecation to actually dropping support for any combination of JDK / Scala / Hadoop / Python.

Ofir Manor

Co-Founder & CTO | Equalum

Mobile: <a href="tel:%2B972-54-7801286" value="+972507470820" target="_blank">+972-54-7801286 | Email: [hidden email]


On Fri, Oct 28, 2016 at 12:13 AM, Sean Owen <[hidden email]> wrote:
The burden may be a little more apparent when dealing with the day to day merging and fixing of breaks. The upside is maybe the more compelling argument though. For example, lambda-fying all the Java code, supporting java.time, and taking advantage of some newer Hadoop/YARN APIs is a moderate win for users too, and there's also a cost to not doing that.

I must say I don't see a risk of fragmentation as nearly the problem it's made out to be here. We are, after all, here discussing _beginning_ to remove support _in 6 months_, for long since non-current versions of things. An org's decision to not, say, use Java 8 is a decision to not use the new version of lots of things. It's not clear this is a constituency that is either large or one to reasonably serve indefinitely.

In the end, the Scala issue may be decisive. Supporting 2.10 - 2.12 simultaneously is a bridge too far, and if 2.12 requires Java 8, it's a good reason to for Spark to require Java 8. And Steve suggests that means a minimum of Hadoop 2.6 too. (I still profess ignorance of the Python part of the issue.)

Put another way I am not sure what the criteria is, if not the above?

I support deprecating all of these things, at the least, in 2.1.0. Although it's a separate question, I believe it's going to be necessary to remove support in ~6 months in 2.2.0.


On Thu, Oct 27, 2016 at 4:36 PM Matei Zaharia <[hidden email]> wrote:
Just to comment on this, I'm generally against removing these types of things unless they create a substantial burden on project contributors. It doesn't sound like Python 2.6 and Java 7 do that yet -- Scala 2.10 might, but then of course we need to wait for 2.12 to be out and stable.

In general, this type of stuff only hurts users, and doesn't have a huge impact on Spark contributors' productivity (sure, it's a bit unpleasant, but that's life). If we break compatibility this way too quickly, we fragment the user community, and then either people have a crappy experience with Spark because their corporate IT doesn't yet have an environment that can run the latest version, or worse, they create more maintenance burden for us because they ask for more patches to be backported to old Spark versions (1.6.x, 2.0.x, etc). Python in particular is pretty fundamental to many Linux distros.

In the future, rather than just looking at when some software came out, it may be good to have some criteria for when to drop support for something. For example, if there are really nice libraries in Python 2.7 or Java 8 that we're missing out on, that may be a good reason. The maintenance burden for multiple Scala versions is definitely painful but I also think we should always support the latest two Scala releases.

Matei

On Oct 27, 2016, at 12:15 PM, Reynold Xin <[hidden email]> wrote:

I created a JIRA ticket to track this: https://issues.apache.org/jira/browse/SPARK-18138



On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran <[hidden email]> wrote:

On 27 Oct 2016, at 10:03, Sean Owen <[hidden email]> wrote:

Seems OK by me.
How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like to add that to a list of things that will begin to be unsupported 6 months from now.


If you go to java 8 only, then hadoop 2.6+ is mandatory. 


On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <[hidden email]> wrote:
that sounds good to me

On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <[hidden email]> wrote:
We can do the following concrete proposal:

1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr 2017).

2. In Spark 2.1.0 release, aggressively and explicitly announce the deprecation of Java 7 / Scala 2.10 support.

(a) It should appear in release notes, documentations that mention how to build Spark

(b) and a warning should be shown every time SparkContext is started using Scala 2.10 or Java 7.





Reply | Threaded
Open this post in threaded view
|

Re: Straw poll: dropping support for things like Scala 2.10

Matei Zaharia
Administrator
Deprecating them is fine (and I know they're already deprecated), the question is just whether to remove them. For example, what exactly is the downside of having Python 2.6 or Java 7 right now? If it's high, then we can remove them, but I just haven't seen a ton of details. It also sounded like fairly recent versions of CDH, HDP, RHEL, etc still have old versions of these.

Just talking with users, I've seen many of people who say "we have a Hadoop cluster from $VENDOR, but we just download Spark from Apache and run newer versions of that". That's great for Spark IMO, and we need to stay compatible even with somewhat older Hadoop installs because they are time-consuming to update. Having the whole community on a small set of versions leads to a better experience for everyone and also to more of a "network effect": more people can battle-test new versions, answer questions about them online, write libraries that easily reach the majority of Spark users, etc.

Matei

On Oct 27, 2016, at 11:51 PM, Ofir Manor <[hidden email]> wrote:

I totally agree with Sean, just a small correction:
Java 7 and Python 2.6 are already deprecated since Spark 2.0 (after a lengthy discussion), so there is no need to discuss whether they should become deprecated in 2.1
The discussion is whether Scala 2.10 should also be marked as deprecated (no one is objecting that), and more importantly, when to actually move from deprecation to actually dropping support for any combination of JDK / Scala / Hadoop / Python.

Ofir Manor

Co-Founder & CTO | Equalum


Mobile: <a href="tel:%2B972-54-7801286" value="+972507470820" target="_blank" class="">+972-54-7801286 | Email: [hidden email]


On Fri, Oct 28, 2016 at 12:13 AM, Sean Owen <[hidden email]> wrote:
The burden may be a little more apparent when dealing with the day to day merging and fixing of breaks. The upside is maybe the more compelling argument though. For example, lambda-fying all the Java code, supporting java.time, and taking advantage of some newer Hadoop/YARN APIs is a moderate win for users too, and there's also a cost to not doing that.

I must say I don't see a risk of fragmentation as nearly the problem it's made out to be here. We are, after all, here discussing _beginning_ to remove support _in 6 months_, for long since non-current versions of things. An org's decision to not, say, use Java 8 is a decision to not use the new version of lots of things. It's not clear this is a constituency that is either large or one to reasonably serve indefinitely.

In the end, the Scala issue may be decisive. Supporting 2.10 - 2.12 simultaneously is a bridge too far, and if 2.12 requires Java 8, it's a good reason to for Spark to require Java 8. And Steve suggests that means a minimum of Hadoop 2.6 too. (I still profess ignorance of the Python part of the issue.)

Put another way I am not sure what the criteria is, if not the above?

I support deprecating all of these things, at the least, in 2.1.0. Although it's a separate question, I believe it's going to be necessary to remove support in ~6 months in 2.2.0.


On Thu, Oct 27, 2016 at 4:36 PM Matei Zaharia <[hidden email]> wrote:
Just to comment on this, I'm generally against removing these types of things unless they create a substantial burden on project contributors. It doesn't sound like Python 2.6 and Java 7 do that yet -- Scala 2.10 might, but then of course we need to wait for 2.12 to be out and stable.

In general, this type of stuff only hurts users, and doesn't have a huge impact on Spark contributors' productivity (sure, it's a bit unpleasant, but that's life). If we break compatibility this way too quickly, we fragment the user community, and then either people have a crappy experience with Spark because their corporate IT doesn't yet have an environment that can run the latest version, or worse, they create more maintenance burden for us because they ask for more patches to be backported to old Spark versions (1.6.x, 2.0.x, etc). Python in particular is pretty fundamental to many Linux distros.

In the future, rather than just looking at when some software came out, it may be good to have some criteria for when to drop support for something. For example, if there are really nice libraries in Python 2.7 or Java 8 that we're missing out on, that may be a good reason. The maintenance burden for multiple Scala versions is definitely painful but I also think we should always support the latest two Scala releases.

Matei

On Oct 27, 2016, at 12:15 PM, Reynold Xin <[hidden email]> wrote:

I created a JIRA ticket to track this: https://issues.apache.org/jira/browse/SPARK-18138



On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran <[hidden email]> wrote:

On 27 Oct 2016, at 10:03, Sean Owen <[hidden email]> wrote:

Seems OK by me.
How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like to add that to a list of things that will begin to be unsupported 6 months from now.


If you go to java 8 only, then hadoop 2.6+ is mandatory. 


On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <[hidden email]> wrote:
that sounds good to me

On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <[hidden email]> wrote:
We can do the following concrete proposal:

1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr 2017).

2. In Spark 2.1.0 release, aggressively and explicitly announce the deprecation of Java 7 / Scala 2.10 support.

(a) It should appear in release notes, documentations that mention how to build Spark

(b) and a warning should be shown every time SparkContext is started using Scala 2.10 or Java 7.






Reply | Threaded
Open this post in threaded view
|

Re: Straw poll: dropping support for things like Scala 2.10

Matei Zaharia
Administrator
BTW maybe one key point that isn't obvious is that with YARN and Mesos, the version of Spark used can be solely up to the developer who writes an app, not to the cluster administrator. So even in very conservative orgs, developers can download a new version of Spark, run it, and demonstrate value, which is good both for them and for the project. On the other hand, if they were stuck with, say, Spark 1.3, they'd have a much worse experience and perhaps get a worse impression of the project.

Matei

On Oct 28, 2016, at 9:58 AM, Matei Zaharia <[hidden email]> wrote:

Deprecating them is fine (and I know they're already deprecated), the question is just whether to remove them. For example, what exactly is the downside of having Python 2.6 or Java 7 right now? If it's high, then we can remove them, but I just haven't seen a ton of details. It also sounded like fairly recent versions of CDH, HDP, RHEL, etc still have old versions of these.

Just talking with users, I've seen many of people who say "we have a Hadoop cluster from $VENDOR, but we just download Spark from Apache and run newer versions of that". That's great for Spark IMO, and we need to stay compatible even with somewhat older Hadoop installs because they are time-consuming to update. Having the whole community on a small set of versions leads to a better experience for everyone and also to more of a "network effect": more people can battle-test new versions, answer questions about them online, write libraries that easily reach the majority of Spark users, etc.

Matei

On Oct 27, 2016, at 11:51 PM, Ofir Manor <[hidden email]> wrote:

I totally agree with Sean, just a small correction:
Java 7 and Python 2.6 are already deprecated since Spark 2.0 (after a lengthy discussion), so there is no need to discuss whether they should become deprecated in 2.1
The discussion is whether Scala 2.10 should also be marked as deprecated (no one is objecting that), and more importantly, when to actually move from deprecation to actually dropping support for any combination of JDK / Scala / Hadoop / Python.

Ofir Manor

Co-Founder & CTO | Equalum


Mobile: <a href="tel:%2B972-54-7801286" value="+972507470820" target="_blank" class="">+972-54-7801286 | Email: [hidden email]


On Fri, Oct 28, 2016 at 12:13 AM, Sean Owen <[hidden email]> wrote:
The burden may be a little more apparent when dealing with the day to day merging and fixing of breaks. The upside is maybe the more compelling argument though. For example, lambda-fying all the Java code, supporting java.time, and taking advantage of some newer Hadoop/YARN APIs is a moderate win for users too, and there's also a cost to not doing that.

I must say I don't see a risk of fragmentation as nearly the problem it's made out to be here. We are, after all, here discussing _beginning_ to remove support _in 6 months_, for long since non-current versions of things. An org's decision to not, say, use Java 8 is a decision to not use the new version of lots of things. It's not clear this is a constituency that is either large or one to reasonably serve indefinitely.

In the end, the Scala issue may be decisive. Supporting 2.10 - 2.12 simultaneously is a bridge too far, and if 2.12 requires Java 8, it's a good reason to for Spark to require Java 8. And Steve suggests that means a minimum of Hadoop 2.6 too. (I still profess ignorance of the Python part of the issue.)

Put another way I am not sure what the criteria is, if not the above?

I support deprecating all of these things, at the least, in 2.1.0. Although it's a separate question, I believe it's going to be necessary to remove support in ~6 months in 2.2.0.


On Thu, Oct 27, 2016 at 4:36 PM Matei Zaharia <[hidden email]> wrote:
Just to comment on this, I'm generally against removing these types of things unless they create a substantial burden on project contributors. It doesn't sound like Python 2.6 and Java 7 do that yet -- Scala 2.10 might, but then of course we need to wait for 2.12 to be out and stable.

In general, this type of stuff only hurts users, and doesn't have a huge impact on Spark contributors' productivity (sure, it's a bit unpleasant, but that's life). If we break compatibility this way too quickly, we fragment the user community, and then either people have a crappy experience with Spark because their corporate IT doesn't yet have an environment that can run the latest version, or worse, they create more maintenance burden for us because they ask for more patches to be backported to old Spark versions (1.6.x, 2.0.x, etc). Python in particular is pretty fundamental to many Linux distros.

In the future, rather than just looking at when some software came out, it may be good to have some criteria for when to drop support for something. For example, if there are really nice libraries in Python 2.7 or Java 8 that we're missing out on, that may be a good reason. The maintenance burden for multiple Scala versions is definitely painful but I also think we should always support the latest two Scala releases.

Matei

On Oct 27, 2016, at 12:15 PM, Reynold Xin <[hidden email]> wrote:

I created a JIRA ticket to track this: https://issues.apache.org/jira/browse/SPARK-18138



On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran <[hidden email]> wrote:

On 27 Oct 2016, at 10:03, Sean Owen <[hidden email]> wrote:

Seems OK by me.
How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like to add that to a list of things that will begin to be unsupported 6 months from now.


If you go to java 8 only, then hadoop 2.6+ is mandatory. 


On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <[hidden email]> wrote:
that sounds good to me

On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <[hidden email]> wrote:
We can do the following concrete proposal:

1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr 2017).

2. In Spark 2.1.0 release, aggressively and explicitly announce the deprecation of Java 7 / Scala 2.10 support.

(a) It should appear in release notes, documentations that mention how to build Spark

(b) and a warning should be shown every time SparkContext is started using Scala 2.10 or Java 7.







Reply | Threaded
Open this post in threaded view
|

Re: Straw poll: dropping support for things like Scala 2.10

Sean Owen
In reply to this post by Matei Zaharia
If the subtext is vendors, then I'd have a look at what recent distros look like. I'll write about CDH as a representative example, but I think other distros are naturally similar.

CDH has been on Java 8, Hadoop 2.6, Python 2.7 for almost two years (CDH 5.3 / Dec 2014). Granted, this depends on installing on an OS with that Java / Python version. But Java 8 / Python 2.7 is available for all of the supported OSes. The population that isn't on CDH 4, because that supported was dropped a long time ago in Spark, and who is on a version released 2-2.5 years ago, and won't update, is a couple percent of the installed base. They do not in general want anything to change at all.

I assure everyone that vendors too are aligned in wanting to cater to the crowd that wants the most recent version of everything. For example, CDH offers both Spark 2.0.1 and 1.6 at the same time.

I wouldn't dismiss support for these supporting components as a relevant proxy for whether they are worth supporting in Spark. Java 7 is long since EOL (no, I don't count paying Oracle for support). No vendor is supporting Hadoop < 2.6. Scala 2.10 was EOL at the end of 2014. Is there a criteria here that reaches a different conclusion about these things just for Spark? This was roughly the same conversation that happened 6 months ago.

I imagine we're going to find that in about 6 months it'll make more sense all around to remove these. If we can just give a heads up with deprecation and then kick the can down the road a bit more, that sounds like enough for now.

On Fri, Oct 28, 2016 at 8:58 AM Matei Zaharia <[hidden email]> wrote:
Deprecating them is fine (and I know they're already deprecated), the question is just whether to remove them. For example, what exactly is the downside of having Python 2.6 or Java 7 right now? If it's high, then we can remove them, but I just haven't seen a ton of details. It also sounded like fairly recent versions of CDH, HDP, RHEL, etc still have old versions of these.

Just talking with users, I've seen many of people who say "we have a Hadoop cluster from $VENDOR, but we just download Spark from Apache and run newer versions of that". That's great for Spark IMO, and we need to stay compatible even with somewhat older Hadoop installs because they are time-consuming to update. Having the whole community on a small set of versions leads to a better experience for everyone and also to more of a "network effect": more people can battle-test new versions, answer questions about them online, write libraries that easily reach the majority of Spark users, etc.
Reply | Threaded
Open this post in threaded view
|

Re: Straw poll: dropping support for things like Scala 2.10

Chris Fregly
i seem to remember a large spark user (tencent, i believe) chiming in late during these discussions 6-12 months ago and squashing any sort of deprecation given the massive effort that would be required to upgrade their environment.

i just want to make sure these convos take into consideration large spark users - and reflect the real world versus ideal world.

otherwise, this is all for naught like last time.

On Oct 28, 2016, at 10:43 AM, Sean Owen <[hidden email]> wrote:

If the subtext is vendors, then I'd have a look at what recent distros look like. I'll write about CDH as a representative example, but I think other distros are naturally similar.

CDH has been on Java 8, Hadoop 2.6, Python 2.7 for almost two years (CDH 5.3 / Dec 2014). Granted, this depends on installing on an OS with that Java / Python version. But Java 8 / Python 2.7 is available for all of the supported OSes. The population that isn't on CDH 4, because that supported was dropped a long time ago in Spark, and who is on a version released 2-2.5 years ago, and won't update, is a couple percent of the installed base. They do not in general want anything to change at all.

I assure everyone that vendors too are aligned in wanting to cater to the crowd that wants the most recent version of everything. For example, CDH offers both Spark 2.0.1 and 1.6 at the same time.

I wouldn't dismiss support for these supporting components as a relevant proxy for whether they are worth supporting in Spark. Java 7 is long since EOL (no, I don't count paying Oracle for support). No vendor is supporting Hadoop < 2.6. Scala 2.10 was EOL at the end of 2014. Is there a criteria here that reaches a different conclusion about these things just for Spark? This was roughly the same conversation that happened 6 months ago.

I imagine we're going to find that in about 6 months it'll make more sense all around to remove these. If we can just give a heads up with deprecation and then kick the can down the road a bit more, that sounds like enough for now.

On Fri, Oct 28, 2016 at 8:58 AM Matei Zaharia <[hidden email]> wrote:
Deprecating them is fine (and I know they're already deprecated), the question is just whether to remove them. For example, what exactly is the downside of having Python 2.6 or Java 7 right now? If it's high, then we can remove them, but I just haven't seen a ton of details. It also sounded like fairly recent versions of CDH, HDP, RHEL, etc still have old versions of these.

Just talking with users, I've seen many of people who say "we have a Hadoop cluster from $VENDOR, but we just download Spark from Apache and run newer versions of that". That's great for Spark IMO, and we need to stay compatible even with somewhat older Hadoop installs because they are time-consuming to update. Having the whole community on a small set of versions leads to a better experience for everyone and also to more of a "network effect": more people can battle-test new versions, answer questions about them online, write libraries that easily reach the majority of Spark users, etc.
Reply | Threaded
Open this post in threaded view
|

Re: Straw poll: dropping support for things like Scala 2.10

Steve Loughran
Twitter just led the release of Hadoop 2.6.5 precisely because they wanted to keep a Java 6 cluster up: the bigger your cluster, the less of a rush to upgrade.

HDP? I believe we install & prefer (openjdk) Java 8, but the Hadoop branch-2 line is intended to build/run on Java 7 too. There's always a conflict between us developers "shiny new features" and ops "keep cluster alive". That's actually where Scala has an edge: no need to upgrade the cluster-wide JVM just for an update, or play games configuring your deployed application to use a different JVM from the Hadoop services (which you can do, after all: it's just path setup). Thinking about it, knowing what can be done there —including documenting it in the spark docs, could be a good migration strategy.

me? I look forward to when we can use Java 9 to isolate transitive dependencies; the bane of everyone's life. Someone needs to start on preparing everything for that to work though.

On 28 Oct 2016, at 11:47, Chris Fregly <[hidden email]> wrote:

i seem to remember a large spark user (tencent, i believe) chiming in late during these discussions 6-12 months ago and squashing any sort of deprecation given the massive effort that would be required to upgrade their environment.

i just want to make sure these convos take into consideration large spark users - and reflect the real world versus ideal world.

otherwise, this is all for naught like last time.

On Oct 28, 2016, at 10:43 AM, Sean Owen <[hidden email]> wrote:

If the subtext is vendors, then I'd have a look at what recent distros look like. I'll write about CDH as a representative example, but I think other distros are naturally similar.

CDH has been on Java 8, Hadoop 2.6, Python 2.7 for almost two years (CDH 5.3 / Dec 2014). Granted, this depends on installing on an OS with that Java / Python version. But Java 8 / Python 2.7 is available for all of the supported OSes. The population that isn't on CDH 4, because that supported was dropped a long time ago in Spark, and who is on a version released 2-2.5 years ago, and won't update, is a couple percent of the installed base. They do not in general want anything to change at all.

I assure everyone that vendors too are aligned in wanting to cater to the crowd that wants the most recent version of everything. For example, CDH offers both Spark 2.0.1 and 1.6 at the same time.

I wouldn't dismiss support for these supporting components as a relevant proxy for whether they are worth supporting in Spark. Java 7 is long since EOL (no, I don't count paying Oracle for support). No vendor is supporting Hadoop < 2.6. Scala 2.10 was EOL at the end of 2014. Is there a criteria here that reaches a different conclusion about these things just for Spark? This was roughly the same conversation that happened 6 months ago.

I imagine we're going to find that in about 6 months it'll make more sense all around to remove these. If we can just give a heads up with deprecation and then kick the can down the road a bit more, that sounds like enough for now.

On Fri, Oct 28, 2016 at 8:58 AM Matei Zaharia <[hidden email]> wrote:
Deprecating them is fine (and I know they're already deprecated), the question is just whether to remove them. For example, what exactly is the downside of having Python 2.6 or Java 7 right now? If it's high, then we can remove them, but I just haven't seen a ton of details. It also sounded like fairly recent versions of CDH, HDP, RHEL, etc still have old versions of these.

Just talking with users, I've seen many of people who say "we have a Hadoop cluster from $VENDOR, but we just download Spark from Apache and run newer versions of that". That's great for Spark IMO, and we need to stay compatible even with somewhat older Hadoop installs because they are time-consuming to update. Having the whole community on a small set of versions leads to a better experience for everyone and also to more of a "network effect": more people can battle-test new versions, answer questions about them online, write libraries that easily reach the majority of Spark users, etc.

Reply | Threaded
Open this post in threaded view
|

Re: Straw poll: dropping support for things like Scala 2.10

Koert Kuipers
thats correct in my experience: we have found a scala update to be straightforward and basically somewhat invisible to ops, but a java upgrade a pain because it is managed and "certified" by ops.

On Fri, Oct 28, 2016 at 9:44 AM, Steve Loughran <[hidden email]> wrote:
Twitter just led the release of Hadoop 2.6.5 precisely because they wanted to keep a Java 6 cluster up: the bigger your cluster, the less of a rush to upgrade.

HDP? I believe we install & prefer (openjdk) Java 8, but the Hadoop branch-2 line is intended to build/run on Java 7 too. There's always a conflict between us developers "shiny new features" and ops "keep cluster alive". That's actually where Scala has an edge: no need to upgrade the cluster-wide JVM just for an update, or play games configuring your deployed application to use a different JVM from the Hadoop services (which you can do, after all: it's just path setup). Thinking about it, knowing what can be done there —including documenting it in the spark docs, could be a good migration strategy.

me? I look forward to when we can use Java 9 to isolate transitive dependencies; the bane of everyone's life. Someone needs to start on preparing everything for that to work though.


On 28 Oct 2016, at 11:47, Chris Fregly <[hidden email]> wrote:

i seem to remember a large spark user (tencent, i believe) chiming in late during these discussions 6-12 months ago and squashing any sort of deprecation given the massive effort that would be required to upgrade their environment.

i just want to make sure these convos take into consideration large spark users - and reflect the real world versus ideal world.

otherwise, this is all for naught like last time.

On Oct 28, 2016, at 10:43 AM, Sean Owen <[hidden email]> wrote:

If the subtext is vendors, then I'd have a look at what recent distros look like. I'll write about CDH as a representative example, but I think other distros are naturally similar.

CDH has been on Java 8, Hadoop 2.6, Python 2.7 for almost two years (CDH 5.3 / Dec 2014). Granted, this depends on installing on an OS with that Java / Python version. But Java 8 / Python 2.7 is available for all of the supported OSes. The population that isn't on CDH 4, because that supported was dropped a long time ago in Spark, and who is on a version released 2-2.5 years ago, and won't update, is a couple percent of the installed base. They do not in general want anything to change at all.

I assure everyone that vendors too are aligned in wanting to cater to the crowd that wants the most recent version of everything. For example, CDH offers both Spark 2.0.1 and 1.6 at the same time.

I wouldn't dismiss support for these supporting components as a relevant proxy for whether they are worth supporting in Spark. Java 7 is long since EOL (no, I don't count paying Oracle for support). No vendor is supporting Hadoop < 2.6. Scala 2.10 was EOL at the end of 2014. Is there a criteria here that reaches a different conclusion about these things just for Spark? This was roughly the same conversation that happened 6 months ago.

I imagine we're going to find that in about 6 months it'll make more sense all around to remove these. If we can just give a heads up with deprecation and then kick the can down the road a bit more, that sounds like enough for now.

On Fri, Oct 28, 2016 at 8:58 AM Matei Zaharia <[hidden email]> wrote:
Deprecating them is fine (and I know they're already deprecated), the question is just whether to remove them. For example, what exactly is the downside of having Python 2.6 or Java 7 right now? If it's high, then we can remove them, but I just haven't seen a ton of details. It also sounded like fairly recent versions of CDH, HDP, RHEL, etc still have old versions of these.

Just talking with users, I've seen many of people who say "we have a Hadoop cluster from $VENDOR, but we just download Spark from Apache and run newer versions of that". That's great for Spark IMO, and we need to stay compatible even with somewhat older Hadoop installs because they are time-consuming to update. Having the whole community on a small set of versions leads to a better experience for everyone and also to more of a "network effect": more people can battle-test new versions, answer questions about them online, write libraries that easily reach the majority of Spark users, etc.


Reply | Threaded
Open this post in threaded view
|

Re: Straw poll: dropping support for things like Scala 2.10

Sean Owen
In reply to this post by Sean Owen
I want to bring up the issue of Scala 2.10 support again, to see how people feel about it. Key opinions from the previous responses, I think:

Cody: only drop 2.10 support when 2.12 support is added
Koert: we need all dependencies to support 2.12; Scala updates are pretty transparent to IT/ops 
Ofir: make sure to deprecate 2.10 in Spark 2.1
Reynold: let’s maybe remove support for Scala 2.10 and Java 7 in Spark 2.2
Matei: let’s not remove things unless they’re burdensome for the project; some people are still on old environments that their IT can’t easily update

Scala 2.10 support was deprecated in 2.1, and we did remove Java 7 support for 2.2. https://issues.apache.org/jira/browse/SPARK-14220 tracks the work to support 2.12, and there is progress, especially in dependencies supporting 2.12.

It looks like 2.12 support may even entail a breaking change as documented in https://issues.apache.org/jira/browse/SPARK-14643 and will mean dropping Kafka 0.8, for example. In any event it’s going to take some surgery and a few hacks to make one code base work across 2.11 and 2.12. I don’t see this happening for Spark 2.2.0 because there are just a few weeks left.

Supporting three versions at once is probably infeasible, so dropping 2.10 should precede 2.12 support. Right now, I would like to make progress towards changes that 2.12 will require but that 2.11/2.10 can support. For example, we have to update scalatest, breeze, chill, etc. and can do that before 2.12 is enabled. However I’m finding making those changes tricky or maybe impossible in one case while 2.10 is still supported.

For 2.2.0, I’m wondering if it makes sense to go ahead and drop 2.10 support, and even get in additional prep work for 2.12, into the 2.2.0 release. The move to support 2.12 in 2.3.0 would then be a smaller change. It isn’t strictly necessary. We could delay all of that until after 2.2.0 and get it all done between 2.2.0 and 2.3.0. But I wonder if 2.10 is legacy enough at this stage to drop for Spark 2.2.0?

I don’t feel strongly about it but there are some reasonable arguments for dropping it:

- 2.10 doesn’t technically support Java 8, though we do have it working still even after requiring Java 8
- Safe to say virtually all common _2.10 libraries has a _2.11 counterpart at this point?
- 2.10.x was “EOL” in September 2015 with the final 2.10.6 release
- For a vendor viewpoint: CDH only supports Scala 2.11 with Spark 2.x

Before I open a JIRA, just soliciting opinions.


On Tue, Oct 25, 2016 at 4:36 PM Sean Owen <[hidden email]> wrote:
I'd like to gauge where people stand on the issue of dropping support for a few things that were considered for 2.0.

First: Scala 2.10. We've seen a number of build breakages this week because the PR builder only tests 2.11. No big deal at this stage, but, it did cause me to wonder whether it's time to plan to drop 2.10 support, especially with 2.12 coming soon.

Next, Java 7. It's reasonably old and out of public updates at this stage. It's not that painful to keep supporting, to be honest. It would simplify some bits of code, some scripts, some testing.

Hadoop versions: I think the the general argument is that most anyone would be using, at the least, 2.6, and it would simplify some code that has to reflect to use not-even-that-new APIs. It would remove some moderate complexity in the build.


"When" is a tricky question. Although it's a little aggressive for minor releases, I think these will all happen before 3.x regardless. 2.1.0 is not out of the question, though coming soon. What about ... 2.2.0?


Although I tend to favor dropping support, I'm mostly asking for current opinions.
Reply | Threaded
Open this post in threaded view
|

Re: Straw poll: dropping support for things like Scala 2.10

Koert Kuipers
given the issues with scala 2.10 and java 8 i am in favor of dropping scala 2.10 in next release 

On Sat, Feb 25, 2017 at 2:10 PM, Sean Owen <[hidden email]> wrote:
I want to bring up the issue of Scala 2.10 support again, to see how people feel about it. Key opinions from the previous responses, I think:

Cody: only drop 2.10 support when 2.12 support is added
Koert: we need all dependencies to support 2.12; Scala updates are pretty transparent to IT/ops 
Ofir: make sure to deprecate 2.10 in Spark 2.1
Reynold: let’s maybe remove support for Scala 2.10 and Java 7 in Spark 2.2
Matei: let’s not remove things unless they’re burdensome for the project; some people are still on old environments that their IT can’t easily update

Scala 2.10 support was deprecated in 2.1, and we did remove Java 7 support for 2.2. https://issues.apache.org/jira/browse/SPARK-14220 tracks the work to support 2.12, and there is progress, especially in dependencies supporting 2.12.

It looks like 2.12 support may even entail a breaking change as documented in https://issues.apache.org/jira/browse/SPARK-14643 and will mean dropping Kafka 0.8, for example. In any event it’s going to take some surgery and a few hacks to make one code base work across 2.11 and 2.12. I don’t see this happening for Spark 2.2.0 because there are just a few weeks left.

Supporting three versions at once is probably infeasible, so dropping 2.10 should precede 2.12 support. Right now, I would like to make progress towards changes that 2.12 will require but that 2.11/2.10 can support. For example, we have to update scalatest, breeze, chill, etc. and can do that before 2.12 is enabled. However I’m finding making those changes tricky or maybe impossible in one case while 2.10 is still supported.

For 2.2.0, I’m wondering if it makes sense to go ahead and drop 2.10 support, and even get in additional prep work for 2.12, into the 2.2.0 release. The move to support 2.12 in 2.3.0 would then be a smaller change. It isn’t strictly necessary. We could delay all of that until after 2.2.0 and get it all done between 2.2.0 and 2.3.0. But I wonder if 2.10 is legacy enough at this stage to drop for Spark 2.2.0?

I don’t feel strongly about it but there are some reasonable arguments for dropping it:

- 2.10 doesn’t technically support Java 8, though we do have it working still even after requiring Java 8
- Safe to say virtually all common _2.10 libraries has a _2.11 counterpart at this point?
- 2.10.x was “EOL” in September 2015 with the final 2.10.6 release
- For a vendor viewpoint: CDH only supports Scala 2.11 with Spark 2.x

Before I open a JIRA, just soliciting opinions.


On Tue, Oct 25, 2016 at 4:36 PM Sean Owen <[hidden email]> wrote:
I'd like to gauge where people stand on the issue of dropping support for a few things that were considered for 2.0.

First: Scala 2.10. We've seen a number of build breakages this week because the PR builder only tests 2.11. No big deal at this stage, but, it did cause me to wonder whether it's time to plan to drop 2.10 support, especially with 2.12 coming soon.

Next, Java 7. It's reasonably old and out of public updates at this stage. It's not that painful to keep supporting, to be honest. It would simplify some bits of code, some scripts, some testing.

Hadoop versions: I think the the general argument is that most anyone would be using, at the least, 2.6, and it would simplify some code that has to reflect to use not-even-that-new APIs. It would remove some moderate complexity in the build.


"When" is a tricky question. Although it's a little aggressive for minor releases, I think these will all happen before 3.x regardless. 2.1.0 is not out of the question, though coming soon. What about ... 2.2.0?


Although I tend to favor dropping support, I'm mostly asking for current opinions.

Reply | Threaded
Open this post in threaded view
|

Re: Straw poll: dropping support for things like Scala 2.10

RussS
+1 on removing 2.10


On Thu, Mar 2, 2017 at 8:51 AM Koert Kuipers <[hidden email]> wrote:
given the issues with scala 2.10 and java 8 i am in favor of dropping scala 2.10 in next release 

On Sat, Feb 25, 2017 at 2:10 PM, Sean Owen <[hidden email]> wrote:
I want to bring up the issue of Scala 2.10 support again, to see how people feel about it. Key opinions from the previous responses, I think:

Cody: only drop 2.10 support when 2.12 support is added
Koert: we need all dependencies to support 2.12; Scala updates are pretty transparent to IT/ops 
Ofir: make sure to deprecate 2.10 in Spark 2.1
Reynold: let’s maybe remove support for Scala 2.10 and Java 7 in Spark 2.2
Matei: let’s not remove things unless they’re burdensome for the project; some people are still on old environments that their IT can’t easily update

Scala 2.10 support was deprecated in 2.1, and we did remove Java 7 support for 2.2. https://issues.apache.org/jira/browse/SPARK-14220 tracks the work to support 2.12, and there is progress, especially in dependencies supporting 2.12.

It looks like 2.12 support may even entail a breaking change as documented in https://issues.apache.org/jira/browse/SPARK-14643 and will mean dropping Kafka 0.8, for example. In any event it’s going to take some surgery and a few hacks to make one code base work across 2.11 and 2.12. I don’t see this happening for Spark 2.2.0 because there are just a few weeks left.

Supporting three versions at once is probably infeasible, so dropping 2.10 should precede 2.12 support. Right now, I would like to make progress towards changes that 2.12 will require but that 2.11/2.10 can support. For example, we have to update scalatest, breeze, chill, etc. and can do that before 2.12 is enabled. However I’m finding making those changes tricky or maybe impossible in one case while 2.10 is still supported.

For 2.2.0, I’m wondering if it makes sense to go ahead and drop 2.10 support, and even get in additional prep work for 2.12, into the 2.2.0 release. The move to support 2.12 in 2.3.0 would then be a smaller change. It isn’t strictly necessary. We could delay all of that until after 2.2.0 and get it all done between 2.2.0 and 2.3.0. But I wonder if 2.10 is legacy enough at this stage to drop for Spark 2.2.0?

I don’t feel strongly about it but there are some reasonable arguments for dropping it:

- 2.10 doesn’t technically support Java 8, though we do have it working still even after requiring Java 8
- Safe to say virtually all common _2.10 libraries has a _2.11 counterpart at this point?
- 2.10.x was “EOL” in September 2015 with the final 2.10.6 release
- For a vendor viewpoint: CDH only supports Scala 2.11 with Spark 2.x

Before I open a JIRA, just soliciting opinions.


On Tue, Oct 25, 2016 at 4:36 PM Sean Owen <[hidden email]> wrote:
I'd like to gauge where people stand on the issue of dropping support for a few things that were considered for 2.0.

First: Scala 2.10. We've seen a number of build breakages this week because the PR builder only tests 2.11. No big deal at this stage, but, it did cause me to wonder whether it's time to plan to drop 2.10 support, especially with 2.12 coming soon.

Next, Java 7. It's reasonably old and out of public updates at this stage. It's not that painful to keep supporting, to be honest. It would simplify some bits of code, some scripts, some testing.

Hadoop versions: I think the the general argument is that most anyone would be using, at the least, 2.6, and it would simplify some code that has to reflect to use not-even-that-new APIs. It would remove some moderate complexity in the build.


"When" is a tricky question. Although it's a little aggressive for minor releases, I think these will all happen before 3.x regardless. 2.1.0 is not out of the question, though coming soon. What about ... 2.2.0?


Although I tend to favor dropping support, I'm mostly asking for current opinions.

Reply | Threaded
Open this post in threaded view
|

Re: Straw poll: dropping support for things like Scala 2.10

Sean Owen
Let's track further discussion at https://issues.apache.org/jira/browse/SPARK-19810

I am also in favor of removing Scala 2.10 support, and will open a WIP to discuss the change, but am not yet sure whether there are objections or deeper support for this.

On Thu, Mar 2, 2017 at 7:51 PM Russell Spitzer <[hidden email]> wrote:
+1 on removing 2.10


On Thu, Mar 2, 2017 at 8:51 AM Koert Kuipers <[hidden email]> wrote:
given the issues with scala 2.10 and java 8 i am in favor of dropping scala 2.10 in next release 

On Sat, Feb 25, 2017 at 2:10 PM, Sean Owen <[hidden email]> wrote:
I want to bring up the issue of Scala 2.10 support again, to see how people feel about it. Key opinions from the previous responses, I think:

Cody: only drop 2.10 support when 2.12 support is added
Koert: we need all dependencies to support 2.12; Scala updates are pretty transparent to IT/ops 
Ofir: make sure to deprecate 2.10 in Spark 2.1
Reynold: let’s maybe remove support for Scala 2.10 and Java 7 in Spark 2.2
Matei: let’s not remove things unless they’re burdensome for the project; some people are still on old environments that their IT can’t easily update

Scala 2.10 support was deprecated in 2.1, and we did remove Java 7 support for 2.2. https://issues.apache.org/jira/browse/SPARK-14220 tracks the work to support 2.12, and there is progress, especially in dependencies supporting 2.12.

It looks like 2.12 support may even entail a breaking change as documented in https://issues.apache.org/jira/browse/SPARK-14643 and will mean dropping Kafka 0.8, for example. In any event it’s going to take some surgery and a few hacks to make one code base work across 2.11 and 2.12. I don’t see this happening for Spark 2.2.0 because there are just a few weeks left.

Supporting three versions at once is probably infeasible, so dropping 2.10 should precede 2.12 support. Right now, I would like to make progress towards changes that 2.12 will require but that 2.11/2.10 can support. For example, we have to update scalatest, breeze, chill, etc. and can do that before 2.12 is enabled. However I’m finding making those changes tricky or maybe impossible in one case while 2.10 is still supported.

For 2.2.0, I’m wondering if it makes sense to go ahead and drop 2.10 support, and even get in additional prep work for 2.12, into the 2.2.0 release. The move to support 2.12 in 2.3.0 would then be a smaller change. It isn’t strictly necessary. We could delay all of that until after 2.2.0 and get it all done between 2.2.0 and 2.3.0. But I wonder if 2.10 is legacy enough at this stage to drop for Spark 2.2.0?

I don’t feel strongly about it but there are some reasonable arguments for dropping it:

- 2.10 doesn’t technically support Java 8, though we do have it working still even after requiring Java 8
- Safe to say virtually all common _2.10 libraries has a _2.11 counterpart at this point?
- 2.10.x was “EOL” in September 2015 with the final 2.10.6 release
- For a vendor viewpoint: CDH only supports Scala 2.11 with Spark 2.x

Before I open a JIRA, just soliciting opinions.


On Tue, Oct 25, 2016 at 4:36 PM Sean Owen <[hidden email]> wrote:
I'd like to gauge where people stand on the issue of dropping support for a few things that were considered for 2.0.

First: Scala 2.10. We've seen a number of build breakages this week because the PR builder only tests 2.11. No big deal at this stage, but, it did cause me to wonder whether it's time to plan to drop 2.10 support, especially with 2.12 coming soon.

Next, Java 7. It's reasonably old and out of public updates at this stage. It's not that painful to keep supporting, to be honest. It would simplify some bits of code, some scripts, some testing.

Hadoop versions: I think the the general argument is that most anyone would be using, at the least, 2.6, and it would simplify some code that has to reflect to use not-even-that-new APIs. It would remove some moderate complexity in the build.


"When" is a tricky question. Although it's a little aggressive for minor releases, I think these will all happen before 3.x regardless. 2.1.0 is not out of the question, though coming soon. What about ... 2.2.0?


Although I tend to favor dropping support, I'm mostly asking for current opinions.

12