Starting to make changes for Spark 3 -- what can we delete?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Starting to make changes for Spark 3 -- what can we delete?

Sean Owen-3
There was already agreement to delete deprecated things like Flume and
Kafka 0.8 support in master. I've got several more on my radar, and
wanted to highlight them and solicit general opinions on where we
should accept breaking changes.

For example how about removing accumulator v1?
https://github.com/apache/spark/pull/22730

Or using the standard Java Optional?
https://github.com/apache/spark/pull/22383

Or cleaning up some old workarounds and APIs while at it?
https://github.com/apache/spark/pull/22729 (still in progress)

I think I talked myself out of replacing Java function interfaces with
java.util.function because...
https://issues.apache.org/jira/browse/SPARK-25369

There are also, say, old json and csv and avro reading method
deprecated since 1.4. Remove?
Anything deprecated since 2.0.0?

Interested in general thoughts on these.

Here are some more items targeted to 3.0:
https://issues.apache.org/jira/browse/SPARK-17875?jql=project%3D%22SPARK%22%20AND%20%22Target%20Version%2Fs%22%3D%223.0.0%22%20ORDER%20BY%20priority%20ASC

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Starting to make changes for Spark 3 -- what can we delete?

Marcelo Vanzin-2
Might be good to take a look at things marked "@DeveloperApi" and
whether they should stay that way.

e.g. I was looking at SparkHadoopUtil and I've always wanted to just
make it private to Spark. I don't see why apps would need any of those
methods.
On Tue, Oct 16, 2018 at 10:18 AM Sean Owen <[hidden email]> wrote:

>
> There was already agreement to delete deprecated things like Flume and
> Kafka 0.8 support in master. I've got several more on my radar, and
> wanted to highlight them and solicit general opinions on where we
> should accept breaking changes.
>
> For example how about removing accumulator v1?
> https://github.com/apache/spark/pull/22730
>
> Or using the standard Java Optional?
> https://github.com/apache/spark/pull/22383
>
> Or cleaning up some old workarounds and APIs while at it?
> https://github.com/apache/spark/pull/22729 (still in progress)
>
> I think I talked myself out of replacing Java function interfaces with
> java.util.function because...
> https://issues.apache.org/jira/browse/SPARK-25369
>
> There are also, say, old json and csv and avro reading method
> deprecated since 1.4. Remove?
> Anything deprecated since 2.0.0?
>
> Interested in general thoughts on these.
>
> Here are some more items targeted to 3.0:
> https://issues.apache.org/jira/browse/SPARK-17875?jql=project%3D%22SPARK%22%20AND%20%22Target%20Version%2Fs%22%3D%223.0.0%22%20ORDER%20BY%20priority%20ASC
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>


--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Starting to make changes for Spark 3 -- what can we delete?

Marco Gaido
Hi all,

I think a very big topic on this would be: what do we want to do with the old mllib API? For long I have been told that it was going to be removed on 3.0. Is this still the plan?

Thanks,
Marco

Il giorno mer 17 ott 2018 alle ore 03:11 Marcelo Vanzin <[hidden email]> ha scritto:
Might be good to take a look at things marked "@DeveloperApi" and
whether they should stay that way.

e.g. I was looking at SparkHadoopUtil and I've always wanted to just
make it private to Spark. I don't see why apps would need any of those
methods.
On Tue, Oct 16, 2018 at 10:18 AM Sean Owen <[hidden email]> wrote:
>
> There was already agreement to delete deprecated things like Flume and
> Kafka 0.8 support in master. I've got several more on my radar, and
> wanted to highlight them and solicit general opinions on where we
> should accept breaking changes.
>
> For example how about removing accumulator v1?
> https://github.com/apache/spark/pull/22730
>
> Or using the standard Java Optional?
> https://github.com/apache/spark/pull/22383
>
> Or cleaning up some old workarounds and APIs while at it?
> https://github.com/apache/spark/pull/22729 (still in progress)
>
> I think I talked myself out of replacing Java function interfaces with
> java.util.function because...
> https://issues.apache.org/jira/browse/SPARK-25369
>
> There are also, say, old json and csv and avro reading method
> deprecated since 1.4. Remove?
> Anything deprecated since 2.0.0?
>
> Interested in general thoughts on these.
>
> Here are some more items targeted to 3.0:
> https://issues.apache.org/jira/browse/SPARK-17875?jql=project%3D%22SPARK%22%20AND%20%22Target%20Version%2Fs%22%3D%223.0.0%22%20ORDER%20BY%20priority%20ASC
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>


--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Starting to make changes for Spark 3 -- what can we delete?

Erik Erlandson-2
My understanding was that the legacy mllib api was frozen, with all new dev going to ML, but it was not going to be removed. Although removing it would get rid of a lot of `OldXxx` shims.

On Wed, Oct 17, 2018 at 12:55 AM Marco Gaido <[hidden email]> wrote:
Hi all,

I think a very big topic on this would be: what do we want to do with the old mllib API? For long I have been told that it was going to be removed on 3.0. Is this still the plan?

Thanks,
Marco

Il giorno mer 17 ott 2018 alle ore 03:11 Marcelo Vanzin <[hidden email]> ha scritto:
Might be good to take a look at things marked "@DeveloperApi" and
whether they should stay that way.

e.g. I was looking at SparkHadoopUtil and I've always wanted to just
make it private to Spark. I don't see why apps would need any of those
methods.
On Tue, Oct 16, 2018 at 10:18 AM Sean Owen <[hidden email]> wrote:
>
> There was already agreement to delete deprecated things like Flume and
> Kafka 0.8 support in master. I've got several more on my radar, and
> wanted to highlight them and solicit general opinions on where we
> should accept breaking changes.
>
> For example how about removing accumulator v1?
> https://github.com/apache/spark/pull/22730
>
> Or using the standard Java Optional?
> https://github.com/apache/spark/pull/22383
>
> Or cleaning up some old workarounds and APIs while at it?
> https://github.com/apache/spark/pull/22729 (still in progress)
>
> I think I talked myself out of replacing Java function interfaces with
> java.util.function because...
> https://issues.apache.org/jira/browse/SPARK-25369
>
> There are also, say, old json and csv and avro reading method
> deprecated since 1.4. Remove?
> Anything deprecated since 2.0.0?
>
> Interested in general thoughts on these.
>
> Here are some more items targeted to 3.0:
> https://issues.apache.org/jira/browse/SPARK-17875?jql=project%3D%22SPARK%22%20AND%20%22Target%20Version%2Fs%22%3D%223.0.0%22%20ORDER%20BY%20priority%20ASC
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>


--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Starting to make changes for Spark 3 -- what can we delete?

DB Tsai-5
I'll +1 on removing those legacy mllib code. Many users are confused about the APIs, and some of them have weird behaviors (for example, in gradient descent, the intercept is regularized which supports not to).

DB Tsai  |  Siri Open Source Technologies [not a contribution]  |   Apple, Inc

> On Oct 17, 2018, at 7:42 AM, Erik Erlandson <[hidden email]> wrote:
>
> My understanding was that the legacy mllib api was frozen, with all new dev going to ML, but it was not going to be removed. Although removing it would get rid of a lot of `OldXxx` shims.
>
> On Wed, Oct 17, 2018 at 12:55 AM Marco Gaido <[hidden email]> wrote:
> Hi all,
>
> I think a very big topic on this would be: what do we want to do with the old mllib API? For long I have been told that it was going to be removed on 3.0. Is this still the plan?
>
> Thanks,
> Marco
>
> Il giorno mer 17 ott 2018 alle ore 03:11 Marcelo Vanzin <[hidden email]> ha scritto:
> Might be good to take a look at things marked "@DeveloperApi" and
> whether they should stay that way.
>
> e.g. I was looking at SparkHadoopUtil and I've always wanted to just
> make it private to Spark. I don't see why apps would need any of those
> methods.
> On Tue, Oct 16, 2018 at 10:18 AM Sean Owen <[hidden email]> wrote:
>>
>> There was already agreement to delete deprecated things like Flume and
>> Kafka 0.8 support in master. I've got several more on my radar, and
>> wanted to highlight them and solicit general opinions on where we
>> should accept breaking changes.
>>
>> For example how about removing accumulator v1?
>> https://github.com/apache/spark/pull/22730
>>
>> Or using the standard Java Optional?
>> https://github.com/apache/spark/pull/22383
>>
>> Or cleaning up some old workarounds and APIs while at it?
>> https://github.com/apache/spark/pull/22729 (still in progress)
>>
>> I think I talked myself out of replacing Java function interfaces with
>> java.util.function because...
>> https://issues.apache.org/jira/browse/SPARK-25369
>>
>> There are also, say, old json and csv and avro reading method
>> deprecated since 1.4. Remove?
>> Anything deprecated since 2.0.0?
>>
>> Interested in general thoughts on these.
>>
>> Here are some more items targeted to 3.0:
>> https://issues.apache.org/jira/browse/SPARK-17875?jql=project%3D%22SPARK%22%20AND%20%22Target%20Version%2Fs%22%3D%223.0.0%22%20ORDER%20BY%20priority%20ASC
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [hidden email]
>>
>
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]