Breaking API changes in Spark 3.0

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Breaking API changes in Spark 3.0

Karen Feng
Hi all,

I am concerned that the API-breaking changes in SPARK-25908 (as well as
SPARK-16775, and potentially others) will make the migration process from
Spark 2 to Spark 3 unnecessarily painful. For example, the removal of
SQLContext.getOrCreate will break a large number of libraries currently
built on Spark 2.

Even if library developers do not use deprecated APIs, API changes between
2.x and 3.x will result in inconsistencies that require hacking around. For
a fairly small and new (2.4.3+) genomics library, I had to create a number
of shims (https://github.com/projectglow/glow/pull/155) for the source and
test code due to API changes in SPARK-25393, SPARK-27328, SPARK-28744.

It would be best practice to avoid breaking existing APIs to ease library
development. To avoid dealing with similar deprecated API issues down the
road, we should practice more prudence when considering new API proposals.

I'd love to see more discussion on this.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Breaking API changes in Spark 3.0

Dongjoon Hyun-2
Hi, Karen.

Are you saying that Spark 3 has to have all deprecated 2.x APIs?
Could you tell us what is your criteria for `unnecessarily` or `necessarily`?

> the migration process from Spark 2 to Spark 3 unnecessarily painful.

Bests,
Dongjoon.


On Tue, Feb 18, 2020 at 4:55 PM Karen Feng <[hidden email]> wrote:
Hi all,

I am concerned that the API-breaking changes in SPARK-25908 (as well as
SPARK-16775, and potentially others) will make the migration process from
Spark 2 to Spark 3 unnecessarily painful. For example, the removal of
SQLContext.getOrCreate will break a large number of libraries currently
built on Spark 2.

Even if library developers do not use deprecated APIs, API changes between
2.x and 3.x will result in inconsistencies that require hacking around. For
a fairly small and new (2.4.3+) genomics library, I had to create a number
of shims (https://github.com/projectglow/glow/pull/155) for the source and
test code due to API changes in SPARK-25393, SPARK-27328, SPARK-28744.

It would be best practice to avoid breaking existing APIs to ease library
development. To avoid dealing with similar deprecated API issues down the
road, we should practice more prudence when considering new API proposals.

I'd love to see more discussion on this.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Breaking API changes in Spark 3.0

Holden Karau
So my understanding would be that to provide a reasonable migration path we’d want the replacement of the deprecated API to also exist in 2.4 this way libraries and programs can dual target during the migration process.

Now that isn’t always going to be doable, but certainly worth looking at the situations where we aren’t providing a smooth migration path and making sure it’s the best thing to do.

On Wed, Feb 19, 2020 at 2:10 PM Dongjoon Hyun <[hidden email]> wrote:
Hi, Karen.

Are you saying that Spark 3 has to have all deprecated 2.x APIs?
Could you tell us what is your criteria for `unnecessarily` or `necessarily`?

> the migration process from Spark 2 to Spark 3 unnecessarily painful.

Bests,
Dongjoon.


On Tue, Feb 18, 2020 at 4:55 PM Karen Feng <[hidden email]> wrote:
Hi all,

I am concerned that the API-breaking changes in SPARK-25908 (as well as
SPARK-16775, and potentially others) will make the migration process from
Spark 2 to Spark 3 unnecessarily painful. For example, the removal of
SQLContext.getOrCreate will break a large number of libraries currently
built on Spark 2.

Even if library developers do not use deprecated APIs, API changes between
2.x and 3.x will result in inconsistencies that require hacking around. For
a fairly small and new (2.4.3+) genomics library, I had to create a number
of shims (https://github.com/projectglow/glow/pull/155) for the source and
test code due to API changes in SPARK-25393, SPARK-27328, SPARK-28744.

It would be best practice to avoid breaking existing APIs to ease library
development. To avoid dealing with similar deprecated API issues down the
road, we should practice more prudence when considering new API proposals.

I'd love to see more discussion on this.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: Breaking API changes in Spark 3.0

Xiao Li-2
Like https://github.com/apache/spark/pull/23131, we added back unionAll. 

We might need to double check whether we removed some widely used APIs in this release before RC. If the maintenance costs are small, keeping some deprecated APIs look reasonable to me. This can help the adoption of Spark 3.0. We need to discuss the APIs case by case. 

Xiao

On Wed, Feb 19, 2020 at 2:14 PM Holden Karau <[hidden email]> wrote:
So my understanding would be that to provide a reasonable migration path we’d want the replacement of the deprecated API to also exist in 2.4 this way libraries and programs can dual target during the migration process.

Now that isn’t always going to be doable, but certainly worth looking at the situations where we aren’t providing a smooth migration path and making sure it’s the best thing to do.

On Wed, Feb 19, 2020 at 2:10 PM Dongjoon Hyun <[hidden email]> wrote:
Hi, Karen.

Are you saying that Spark 3 has to have all deprecated 2.x APIs?
Could you tell us what is your criteria for `unnecessarily` or `necessarily`?

> the migration process from Spark 2 to Spark 3 unnecessarily painful.

Bests,
Dongjoon.


On Tue, Feb 18, 2020 at 4:55 PM Karen Feng <[hidden email]> wrote:
Hi all,

I am concerned that the API-breaking changes in SPARK-25908 (as well as
SPARK-16775, and potentially others) will make the migration process from
Spark 2 to Spark 3 unnecessarily painful. For example, the removal of
SQLContext.getOrCreate will break a large number of libraries currently
built on Spark 2.

Even if library developers do not use deprecated APIs, API changes between
2.x and 3.x will result in inconsistencies that require hacking around. For
a fairly small and new (2.4.3+) genomics library, I had to create a number
of shims (https://github.com/projectglow/glow/pull/155) for the source and
test code due to API changes in SPARK-25393, SPARK-27328, SPARK-28744.

It would be best practice to avoid breaking existing APIs to ease library
development. To avoid dealing with similar deprecated API issues down the
road, we should practice more prudence when considering new API proposals.

I'd love to see more discussion on this.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 


--
Reply | Threaded
Open this post in threaded view
|

Re: Breaking API changes in Spark 3.0

Dongjoon Hyun-2
Sure. I understand the background of the following requests. So, it's a good time to decide the criteria in order to start discussion.

    1. "to provide a reasonable migration path we’d want the replacement of the deprecated API to also exist in 2.4"
    2. "We need to discuss the APIs case by case"

For now, it's unclear what is `necessarily painful`, what is "widely used APIs", or how small is "the maintenance costs are small".

I'm wondering if the goal of Apache Spark 3.0.0 is being 100% backward compatible with Apache Spark 2.4.5 like Apache Kafka?
Are we going to revert all changes? If there is a clear criteria, we didn't need to do the clean up for that long period of 3.0.0.

BTW, to be clear, we are talking about 2.4.5 and 3.0.0 compatibility in this thread.

Bests,
Dongjoon.


On Wed, Feb 19, 2020 at 2:20 PM Xiao Li <[hidden email]> wrote:
Like https://github.com/apache/spark/pull/23131, we added back unionAll. 

We might need to double check whether we removed some widely used APIs in this release before RC. If the maintenance costs are small, keeping some deprecated APIs look reasonable to me. This can help the adoption of Spark 3.0. We need to discuss the APIs case by case. 

Xiao

On Wed, Feb 19, 2020 at 2:14 PM Holden Karau <[hidden email]> wrote:
So my understanding would be that to provide a reasonable migration path we’d want the replacement of the deprecated API to also exist in 2.4 this way libraries and programs can dual target during the migration process.

Now that isn’t always going to be doable, but certainly worth looking at the situations where we aren’t providing a smooth migration path and making sure it’s the best thing to do.

On Wed, Feb 19, 2020 at 2:10 PM Dongjoon Hyun <[hidden email]> wrote:
Hi, Karen.

Are you saying that Spark 3 has to have all deprecated 2.x APIs?
Could you tell us what is your criteria for `unnecessarily` or `necessarily`?

> the migration process from Spark 2 to Spark 3 unnecessarily painful.

Bests,
Dongjoon.


On Tue, Feb 18, 2020 at 4:55 PM Karen Feng <[hidden email]> wrote:
Hi all,

I am concerned that the API-breaking changes in SPARK-25908 (as well as
SPARK-16775, and potentially others) will make the migration process from
Spark 2 to Spark 3 unnecessarily painful. For example, the removal of
SQLContext.getOrCreate will break a large number of libraries currently
built on Spark 2.

Even if library developers do not use deprecated APIs, API changes between
2.x and 3.x will result in inconsistencies that require hacking around. For
a fairly small and new (2.4.3+) genomics library, I had to create a number
of shims (https://github.com/projectglow/glow/pull/155) for the source and
test code due to API changes in SPARK-25393, SPARK-27328, SPARK-28744.

It would be best practice to avoid breaking existing APIs to ease library
development. To avoid dealing with similar deprecated API issues down the
road, we should practice more prudence when considering new API proposals.

I'd love to see more discussion on this.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 


--
Reply | Threaded
Open this post in threaded view
|

Re: Breaking API changes in Spark 3.0

Jungtaek Lim-2
Apache Spark 2.0 was released in July 2016. Assuming the project has been trying the best to follow the semantic versioning, it is "more than three years" to wait for the breaking changes. What the community misses to address necessary breaking changes would be going to be technical debts for another 3+ years.

As the PRs removing deprecated APIs were pointed out first, I'm not sure about the reason. I roughly remember that these PRs target to remove deprecated APIs deprecated at couple of minor versions before. If then what's the matter?

If the deprecation messages don't kindly guide about alternatives then that's the major problem the community should concern and try to fix, but that's another problem. The community doesn't deprecate the API just for fun. Every deprecation has the reason, and not removing the API doesn't make sense unless the community has mistaken for a reason of deprecation.

If the community really would like to build some (soft) rules/policies on deprecation, I would only imagine 2 items - 

1. define "minimum release to live" (either each deprecated API or globally)
2. never skip describing the reason of deprecation and try best to describe alternative works same or similar - if the alternative doesn't work exactly same, also describe the difference (optionally, maybe)

I cannot imagine other problems at all about deprecation.

On Thu, Feb 20, 2020 at 7:36 AM Dongjoon Hyun <[hidden email]> wrote:
Sure. I understand the background of the following requests. So, it's a good time to decide the criteria in order to start discussion.

    1. "to provide a reasonable migration path we’d want the replacement of the deprecated API to also exist in 2.4"
    2. "We need to discuss the APIs case by case"

For now, it's unclear what is `necessarily painful`, what is "widely used APIs", or how small is "the maintenance costs are small".

I'm wondering if the goal of Apache Spark 3.0.0 is being 100% backward compatible with Apache Spark 2.4.5 like Apache Kafka?
Are we going to revert all changes? If there is a clear criteria, we didn't need to do the clean up for that long period of 3.0.0.

BTW, to be clear, we are talking about 2.4.5 and 3.0.0 compatibility in this thread.

Bests,
Dongjoon.


On Wed, Feb 19, 2020 at 2:20 PM Xiao Li <[hidden email]> wrote:
Like https://github.com/apache/spark/pull/23131, we added back unionAll. 

We might need to double check whether we removed some widely used APIs in this release before RC. If the maintenance costs are small, keeping some deprecated APIs look reasonable to me. This can help the adoption of Spark 3.0. We need to discuss the APIs case by case. 

Xiao

On Wed, Feb 19, 2020 at 2:14 PM Holden Karau <[hidden email]> wrote:
So my understanding would be that to provide a reasonable migration path we’d want the replacement of the deprecated API to also exist in 2.4 this way libraries and programs can dual target during the migration process.

Now that isn’t always going to be doable, but certainly worth looking at the situations where we aren’t providing a smooth migration path and making sure it’s the best thing to do.

On Wed, Feb 19, 2020 at 2:10 PM Dongjoon Hyun <[hidden email]> wrote:
Hi, Karen.

Are you saying that Spark 3 has to have all deprecated 2.x APIs?
Could you tell us what is your criteria for `unnecessarily` or `necessarily`?

> the migration process from Spark 2 to Spark 3 unnecessarily painful.

Bests,
Dongjoon.


On Tue, Feb 18, 2020 at 4:55 PM Karen Feng <[hidden email]> wrote:
Hi all,

I am concerned that the API-breaking changes in SPARK-25908 (as well as
SPARK-16775, and potentially others) will make the migration process from
Spark 2 to Spark 3 unnecessarily painful. For example, the removal of
SQLContext.getOrCreate will break a large number of libraries currently
built on Spark 2.

Even if library developers do not use deprecated APIs, API changes between
2.x and 3.x will result in inconsistencies that require hacking around. For
a fairly small and new (2.4.3+) genomics library, I had to create a number
of shims (https://github.com/projectglow/glow/pull/155) for the source and
test code due to API changes in SPARK-25393, SPARK-27328, SPARK-28744.

It would be best practice to avoid breaking existing APIs to ease library
development. To avoid dealing with similar deprecated API issues down the
road, we should practice more prudence when considering new API proposals.

I'd love to see more discussion on this.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 


--
Reply | Threaded
Open this post in threaded view
|

Re: Breaking API changes in Spark 3.0

Jungtaek Lim-2
I think I was too rushed to read and focused on the first sentence of Karen's input. Sorry about that.

As I said I'm not sure I can agree with the point of deprecation and breaking changes of APIs, the thread has another topic which seems to be a good input - practice on new API proposal. I feel it should be different thread to discuss, though.

Maybe we can make the deprecation of API as "heavy-weight" operation to mitigate the impact a bit, like requiring discussion thread to reach consensus before going through PR. For now, you have no idea which API is going to be deprecated and why if you only subscribe to dev@. Even you subscribe the issue@ you would miss it among flooded issues.

Personally I feel the root cause as dev@ is very quiet compared to the volume of PRs the community gets and the impacts of changes these PRs have been made. I agree we should have balance on this to avoid restricting ourselves too much, but I feel there's no balance now - most things are just going through PRs without discussion. It would be ideal we have time to consider on this.


On Thu, Feb 20, 2020 at 8:50 AM Jungtaek Lim <[hidden email]> wrote:
Apache Spark 2.0 was released in July 2016. Assuming the project has been trying the best to follow the semantic versioning, it is "more than three years" to wait for the breaking changes. What the community misses to address necessary breaking changes would be going to be technical debts for another 3+ years.

As the PRs removing deprecated APIs were pointed out first, I'm not sure about the reason. I roughly remember that these PRs target to remove deprecated APIs deprecated at couple of minor versions before. If then what's the matter?

If the deprecation messages don't kindly guide about alternatives then that's the major problem the community should concern and try to fix, but that's another problem. The community doesn't deprecate the API just for fun. Every deprecation has the reason, and not removing the API doesn't make sense unless the community has mistaken for a reason of deprecation.

If the community really would like to build some (soft) rules/policies on deprecation, I would only imagine 2 items - 

1. define "minimum release to live" (either each deprecated API or globally)
2. never skip describing the reason of deprecation and try best to describe alternative works same or similar - if the alternative doesn't work exactly same, also describe the difference (optionally, maybe)

I cannot imagine other problems at all about deprecation.

On Thu, Feb 20, 2020 at 7:36 AM Dongjoon Hyun <[hidden email]> wrote:
Sure. I understand the background of the following requests. So, it's a good time to decide the criteria in order to start discussion.

    1. "to provide a reasonable migration path we’d want the replacement of the deprecated API to also exist in 2.4"
    2. "We need to discuss the APIs case by case"

For now, it's unclear what is `necessarily painful`, what is "widely used APIs", or how small is "the maintenance costs are small".

I'm wondering if the goal of Apache Spark 3.0.0 is being 100% backward compatible with Apache Spark 2.4.5 like Apache Kafka?
Are we going to revert all changes? If there is a clear criteria, we didn't need to do the clean up for that long period of 3.0.0.

BTW, to be clear, we are talking about 2.4.5 and 3.0.0 compatibility in this thread.

Bests,
Dongjoon.


On Wed, Feb 19, 2020 at 2:20 PM Xiao Li <[hidden email]> wrote:
Like https://github.com/apache/spark/pull/23131, we added back unionAll. 

We might need to double check whether we removed some widely used APIs in this release before RC. If the maintenance costs are small, keeping some deprecated APIs look reasonable to me. This can help the adoption of Spark 3.0. We need to discuss the APIs case by case. 

Xiao

On Wed, Feb 19, 2020 at 2:14 PM Holden Karau <[hidden email]> wrote:
So my understanding would be that to provide a reasonable migration path we’d want the replacement of the deprecated API to also exist in 2.4 this way libraries and programs can dual target during the migration process.

Now that isn’t always going to be doable, but certainly worth looking at the situations where we aren’t providing a smooth migration path and making sure it’s the best thing to do.

On Wed, Feb 19, 2020 at 2:10 PM Dongjoon Hyun <[hidden email]> wrote:
Hi, Karen.

Are you saying that Spark 3 has to have all deprecated 2.x APIs?
Could you tell us what is your criteria for `unnecessarily` or `necessarily`?

> the migration process from Spark 2 to Spark 3 unnecessarily painful.

Bests,
Dongjoon.


On Tue, Feb 18, 2020 at 4:55 PM Karen Feng <[hidden email]> wrote:
Hi all,

I am concerned that the API-breaking changes in SPARK-25908 (as well as
SPARK-16775, and potentially others) will make the migration process from
Spark 2 to Spark 3 unnecessarily painful. For example, the removal of
SQLContext.getOrCreate will break a large number of libraries currently
built on Spark 2.

Even if library developers do not use deprecated APIs, API changes between
2.x and 3.x will result in inconsistencies that require hacking around. For
a fairly small and new (2.4.3+) genomics library, I had to create a number
of shims (https://github.com/projectglow/glow/pull/155) for the source and
test code due to API changes in SPARK-25393, SPARK-27328, SPARK-28744.

It would be best practice to avoid breaking existing APIs to ease library
development. To avoid dealing with similar deprecated API issues down the
road, we should practice more prudence when considering new API proposals.

I'd love to see more discussion on this.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 


--
Reply | Threaded
Open this post in threaded view
|

Re: Breaking API changes in Spark 3.0

Holden Karau
So my view of how common & stable API removal should go (in general I want to be clear exceptions can and do make sense)
1) Deprecate API
2) Release replacement API
3) Provide migration guidance (ideally in deprecated annotation, but possible in release notes or elsewhere)
4) Remove old API

I think, ideally, we should 1, 2, and 3 occur in a release prior to 4. If this is not possible I think having a quick discussion on the dev list is a reasonable given the potential impact on our users. I think the preview release is a good opportunity for us to get an idea of if something is going to have a really large impact.

I think we've felt this pain as developers on top of Scala before, and knowing how painful that has been in our own experiences I'd like us to minimize the pain of this type that we make our users experience. 
And it's not like having the conversation will have no utility at all, the discussion will be visible to users searching so they can see the rationale and hopefully migration suggestions.


On Wed, Feb 19, 2020 at 7:02 PM Jungtaek Lim <[hidden email]> wrote:
I think I was too rushed to read and focused on the first sentence of Karen's input. Sorry about that.

As I said I'm not sure I can agree with the point of deprecation and breaking changes of APIs, the thread has another topic which seems to be a good input - practice on new API proposal. I feel it should be different thread to discuss, though.

Maybe we can make the deprecation of API as "heavy-weight" operation to mitigate the impact a bit, like requiring discussion thread to reach consensus before going through PR. For now, you have no idea which API is going to be deprecated and why if you only subscribe to dev@. Even you subscribe the issue@ you would miss it among flooded issues.

Personally I feel the root cause as dev@ is very quiet compared to the volume of PRs the community gets and the impacts of changes these PRs have been made. I agree we should have balance on this to avoid restricting ourselves too much, but I feel there's no balance now - most things are just going through PRs without discussion. It would be ideal we have time to consider on this.


On Thu, Feb 20, 2020 at 8:50 AM Jungtaek Lim <[hidden email]> wrote:
Apache Spark 2.0 was released in July 2016. Assuming the project has been trying the best to follow the semantic versioning, it is "more than three years" to wait for the breaking changes. What the community misses to address necessary breaking changes would be going to be technical debts for another 3+ years.

As the PRs removing deprecated APIs were pointed out first, I'm not sure about the reason. I roughly remember that these PRs target to remove deprecated APIs deprecated at couple of minor versions before. If then what's the matter?

If the deprecation messages don't kindly guide about alternatives then that's the major problem the community should concern and try to fix, but that's another problem. The community doesn't deprecate the API just for fun. Every deprecation has the reason, and not removing the API doesn't make sense unless the community has mistaken for a reason of deprecation.

If the community really would like to build some (soft) rules/policies on deprecation, I would only imagine 2 items - 

1. define "minimum release to live" (either each deprecated API or globally)
2. never skip describing the reason of deprecation and try best to describe alternative works same or similar - if the alternative doesn't work exactly same, also describe the difference (optionally, maybe)

I cannot imagine other problems at all about deprecation.
I think those guidelines seem reasonable to me. I've written a bit more about what I'd expect us to be doing as a project with as many downstream consumers that we have. 

On Thu, Feb 20, 2020 at 7:36 AM Dongjoon Hyun <[hidden email]> wrote:
Sure. I understand the background of the following requests. So, it's a good time to decide the criteria in order to start discussion.

    1. "to provide a reasonable migration path we’d want the replacement of the deprecated API to also exist in 2.4"
    2. "We need to discuss the APIs case by case"

For now, it's unclear what is `necessarily painful`, what is "widely used APIs", or how small is "the maintenance costs are small".
I think these are all case by case. For example, to me, in the original situation which kicked off the thread the SQLContext getOrCreate probably doesn't need to keep existing given that we've had SparkSession builder's getOrCreate for several releases and it's been deprecated.

I'm wondering if the goal of Apache Spark 3.0.0 is being 100% backward compatible with Apache Spark 2.4.5 like Apache Kafka?
Are we going to revert all changes? If there is a clear criteria, we didn't need to do the clean up for that long period of 3.0.0.

BTW, to be clear, we are talking about 2.4.5 and 3.0.0 compatibility in this thread.

Bests,
Dongjoon.


On Wed, Feb 19, 2020 at 2:20 PM Xiao Li <[hidden email]> wrote:
Like https://github.com/apache/spark/pull/23131, we added back unionAll. 

We might need to double check whether we removed some widely used APIs in this release before RC. If the maintenance costs are small, keeping some deprecated APIs look reasonable to me. This can help the adoption of Spark 3.0. We need to discuss the APIs case by case. 

Xiao

On Wed, Feb 19, 2020 at 2:14 PM Holden Karau <[hidden email]> wrote:
So my understanding would be that to provide a reasonable migration path we’d want the replacement of the deprecated API to also exist in 2.4 this way libraries and programs can dual target during the migration process.

Now that isn’t always going to be doable, but certainly worth looking at the situations where we aren’t providing a smooth migration path and making sure it’s the best thing to do.

On Wed, Feb 19, 2020 at 2:10 PM Dongjoon Hyun <[hidden email]> wrote:
Hi, Karen.

Are you saying that Spark 3 has to have all deprecated 2.x APIs?
Could you tell us what is your criteria for `unnecessarily` or `necessarily`?

> the migration process from Spark 2 to Spark 3 unnecessarily painful.

Bests,
Dongjoon.


On Tue, Feb 18, 2020 at 4:55 PM Karen Feng <[hidden email]> wrote:
Hi all,

I am concerned that the API-breaking changes in SPARK-25908 (as well as
SPARK-16775, and potentially others) will make the migration process from
Spark 2 to Spark 3 unnecessarily painful. For example, the removal of
SQLContext.getOrCreate will break a large number of libraries currently
built on Spark 2.

Even if library developers do not use deprecated APIs, API changes between
2.x and 3.x will result in inconsistencies that require hacking around. For
a fairly small and new (2.4.3+) genomics library, I had to create a number
of shims (https://github.com/projectglow/glow/pull/155) for the source and
test code due to API changes in SPARK-25393, SPARK-27328, SPARK-28744.

It would be best practice to avoid breaking existing APIs to ease library
development. To avoid dealing with similar deprecated API issues down the
road, we should practice more prudence when considering new API proposals.

I'd love to see more discussion on this.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 


--


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9