[DISCUSS] Drop Python 2, 3.4 and 3.5

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Drop Python 2, 3.4 and 3.5

Hyukjin Kwon
Hi all,

I would like to discuss dropping deprecated Python versions 2, 3.4 and 3.5 at https://github.com/apache/spark/pull/28957. I assume people support it in general
but I am writing this to make sure everybody is happy.

Fokko made a very good investigation on it, see https://github.com/apache/spark/pull/28957#issuecomment-652022449.
Assuming from the statistics, I think we're pretty safe to drop them.
Also note that dropping Python 2 was actually declared at https://python3statement.org/

Roughly speaking, there are many main advantages by dropping them:
  1. It removes a bunch of hacks we added around 700 lines in PySpark.
  2. PyPy2 has a critical bug that causes a flaky test, https://issues.apache.org/jira/browse/SPARK-28358 given my testing and investigation. 
  3. Users can use Python type hints with Pandas UDFs without thinking about Python version
  4. Users can leverage one latest cloudpickle, https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also leverage C pickle.
  5. ...

So it benefits both users and dev. WDYT guys?


Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

Holden Karau
To be clear the plan is to drop them in Spark 3.1 onwards, yes?

On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon <[hidden email]> wrote:
Hi all,

I would like to discuss dropping deprecated Python versions 2, 3.4 and 3.5 at https://github.com/apache/spark/pull/28957. I assume people support it in general
but I am writing this to make sure everybody is happy.

Fokko made a very good investigation on it, see https://github.com/apache/spark/pull/28957#issuecomment-652022449.
Assuming from the statistics, I think we're pretty safe to drop them.
Also note that dropping Python 2 was actually declared at https://python3statement.org/

Roughly speaking, there are many main advantages by dropping them:
  1. It removes a bunch of hacks we added around 700 lines in PySpark.
  2. PyPy2 has a critical bug that causes a flaky test, https://issues.apache.org/jira/browse/SPARK-28358 given my testing and investigation. 
  3. Users can use Python type hints with Pandas UDFs without thinking about Python version
  4. Users can leverage one latest cloudpickle, https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also leverage C pickle.
  5. ...

So it benefits both users and dev. WDYT guys?


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

Hyukjin Kwon
Yeah, sure. It will be dropped at Spark 3.1 onwards. I don't think we should make such changes in maintenance releases

2020년 7월 2일 (목) 오전 11:13, Holden Karau <[hidden email]>님이 작성:
To be clear the plan is to drop them in Spark 3.1 onwards, yes?

On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon <[hidden email]> wrote:
Hi all,

I would like to discuss dropping deprecated Python versions 2, 3.4 and 3.5 at https://github.com/apache/spark/pull/28957. I assume people support it in general
but I am writing this to make sure everybody is happy.

Fokko made a very good investigation on it, see https://github.com/apache/spark/pull/28957#issuecomment-652022449.
Assuming from the statistics, I think we're pretty safe to drop them.
Also note that dropping Python 2 was actually declared at https://python3statement.org/

Roughly speaking, there are many main advantages by dropping them:
  1. It removes a bunch of hacks we added around 700 lines in PySpark.
  2. PyPy2 has a critical bug that causes a flaky test, https://issues.apache.org/jira/browse/SPARK-28358 given my testing and investigation. 
  3. Users can use Python type hints with Pandas UDFs without thinking about Python version
  4. Users can leverage one latest cloudpickle, https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also leverage C pickle.
  5. ...

So it benefits both users and dev. WDYT guys?


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

Holden Karau
I’m ok with us dropping Python 2, 3.4, and 3.5 in Spark 3.1 forward. It will be exciting to get to use more recent Python features. The most recent Ubuntu LTS ships with 3.7, and while the previous LTS ships with 3.5, if folks really can’t upgrade there’s conda.

Is there anyone with a large Python 3.5 fleet who can’t use conda?

On Wed, Jul 1, 2020 at 7:15 PM Hyukjin Kwon <[hidden email]> wrote:
Yeah, sure. It will be dropped at Spark 3.1 onwards. I don't think we should make such changes in maintenance releases

2020년 7월 2일 (목) 오전 11:13, Holden Karau <[hidden email]>님이 작성:
To be clear the plan is to drop them in Spark 3.1 onwards, yes?

On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon <[hidden email]> wrote:
Hi all,

I would like to discuss dropping deprecated Python versions 2, 3.4 and 3.5 at https://github.com/apache/spark/pull/28957. I assume people support it in general
but I am writing this to make sure everybody is happy.

Fokko made a very good investigation on it, see https://github.com/apache/spark/pull/28957#issuecomment-652022449.
Assuming from the statistics, I think we're pretty safe to drop them.
Also note that dropping Python 2 was actually declared at https://python3statement.org/

Roughly speaking, there are many main advantages by dropping them:
  1. It removes a bunch of hacks we added around 700 lines in PySpark.
  2. PyPy2 has a critical bug that causes a flaky test, https://issues.apache.org/jira/browse/SPARK-28358 given my testing and investigation. 
  3. Users can use Python type hints with Pandas UDFs without thinking about Python version
  4. Users can leverage one latest cloudpickle, https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also leverage C pickle.
  5. ...

So it benefits both users and dev. WDYT guys?


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

Yuanjian Li
+1, especially Python 2

Holden Karau <[hidden email]> 于2020年7月2日周四 上午10:20写道:
I’m ok with us dropping Python 2, 3.4, and 3.5 in Spark 3.1 forward. It will be exciting to get to use more recent Python features. The most recent Ubuntu LTS ships with 3.7, and while the previous LTS ships with 3.5, if folks really can’t upgrade there’s conda.

Is there anyone with a large Python 3.5 fleet who can’t use conda?

On Wed, Jul 1, 2020 at 7:15 PM Hyukjin Kwon <[hidden email]> wrote:
Yeah, sure. It will be dropped at Spark 3.1 onwards. I don't think we should make such changes in maintenance releases

2020년 7월 2일 (목) 오전 11:13, Holden Karau <[hidden email]>님이 작성:
To be clear the plan is to drop them in Spark 3.1 onwards, yes?

On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon <[hidden email]> wrote:
Hi all,

I would like to discuss dropping deprecated Python versions 2, 3.4 and 3.5 at https://github.com/apache/spark/pull/28957. I assume people support it in general
but I am writing this to make sure everybody is happy.

Fokko made a very good investigation on it, see https://github.com/apache/spark/pull/28957#issuecomment-652022449.
Assuming from the statistics, I think we're pretty safe to drop them.
Also note that dropping Python 2 was actually declared at https://python3statement.org/

Roughly speaking, there are many main advantages by dropping them:
  1. It removes a bunch of hacks we added around 700 lines in PySpark.
  2. PyPy2 has a critical bug that causes a flaky test, https://issues.apache.org/jira/browse/SPARK-28358 given my testing and investigation. 
  3. Users can use Python type hints with Pandas UDFs without thinking about Python version
  4. Users can leverage one latest cloudpickle, https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also leverage C pickle.
  5. ...

So it benefits both users and dev. WDYT guys?


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

Dongjoon Hyun-2
Thank you, Hyukjin.

According to the Python community, Python 3.5 is also EOF at 2020-09-13 (only two months left).

- https://www.python.org/downloads/

So, targeting live Python versions at Apache Spark 3.1.0 (December 2020) looks reasonable to me.

For old Python versions, we still have Apache Spark 2.4 LTS and also Apache Spark 3.0.x will work.

Bests,
Dongjoon.


On Wed, Jul 1, 2020 at 10:50 PM Yuanjian Li <[hidden email]> wrote:
+1, especially Python 2

Holden Karau <[hidden email]> 于2020年7月2日周四 上午10:20写道:
I’m ok with us dropping Python 2, 3.4, and 3.5 in Spark 3.1 forward. It will be exciting to get to use more recent Python features. The most recent Ubuntu LTS ships with 3.7, and while the previous LTS ships with 3.5, if folks really can’t upgrade there’s conda.

Is there anyone with a large Python 3.5 fleet who can’t use conda?

On Wed, Jul 1, 2020 at 7:15 PM Hyukjin Kwon <[hidden email]> wrote:
Yeah, sure. It will be dropped at Spark 3.1 onwards. I don't think we should make such changes in maintenance releases

2020년 7월 2일 (목) 오전 11:13, Holden Karau <[hidden email]>님이 작성:
To be clear the plan is to drop them in Spark 3.1 onwards, yes?

On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon <[hidden email]> wrote:
Hi all,

I would like to discuss dropping deprecated Python versions 2, 3.4 and 3.5 at https://github.com/apache/spark/pull/28957. I assume people support it in general
but I am writing this to make sure everybody is happy.

Fokko made a very good investigation on it, see https://github.com/apache/spark/pull/28957#issuecomment-652022449.
Assuming from the statistics, I think we're pretty safe to drop them.
Also note that dropping Python 2 was actually declared at https://python3statement.org/

Roughly speaking, there are many main advantages by dropping them:
  1. It removes a bunch of hacks we added around 700 lines in PySpark.
  2. PyPy2 has a critical bug that causes a flaky test, https://issues.apache.org/jira/browse/SPARK-28358 given my testing and investigation. 
  3. Users can use Python type hints with Pandas UDFs without thinking about Python version
  4. Users can leverage one latest cloudpickle, https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also leverage C pickle.
  5. ...

So it benefits both users and dev. WDYT guys?


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

Hyukjin Kwon
Thanks Dongjoon. That makes much more sense now!

2020년 7월 3일 (금) 오전 12:11, Dongjoon Hyun <[hidden email]>님이 작성:
Thank you, Hyukjin.

According to the Python community, Python 3.5 is also EOF at 2020-09-13 (only two months left).

- https://www.python.org/downloads/

So, targeting live Python versions at Apache Spark 3.1.0 (December 2020) looks reasonable to me.

For old Python versions, we still have Apache Spark 2.4 LTS and also Apache Spark 3.0.x will work.

Bests,
Dongjoon.


On Wed, Jul 1, 2020 at 10:50 PM Yuanjian Li <[hidden email]> wrote:
+1, especially Python 2

Holden Karau <[hidden email]> 于2020年7月2日周四 上午10:20写道:
I’m ok with us dropping Python 2, 3.4, and 3.5 in Spark 3.1 forward. It will be exciting to get to use more recent Python features. The most recent Ubuntu LTS ships with 3.7, and while the previous LTS ships with 3.5, if folks really can’t upgrade there’s conda.

Is there anyone with a large Python 3.5 fleet who can’t use conda?

On Wed, Jul 1, 2020 at 7:15 PM Hyukjin Kwon <[hidden email]> wrote:
Yeah, sure. It will be dropped at Spark 3.1 onwards. I don't think we should make such changes in maintenance releases

2020년 7월 2일 (목) 오전 11:13, Holden Karau <[hidden email]>님이 작성:
To be clear the plan is to drop them in Spark 3.1 onwards, yes?

On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon <[hidden email]> wrote:
Hi all,

I would like to discuss dropping deprecated Python versions 2, 3.4 and 3.5 at https://github.com/apache/spark/pull/28957. I assume people support it in general
but I am writing this to make sure everybody is happy.

Fokko made a very good investigation on it, see https://github.com/apache/spark/pull/28957#issuecomment-652022449.
Assuming from the statistics, I think we're pretty safe to drop them.
Also note that dropping Python 2 was actually declared at https://python3statement.org/

Roughly speaking, there are many main advantages by dropping them:
  1. It removes a bunch of hacks we added around 700 lines in PySpark.
  2. PyPy2 has a critical bug that causes a flaky test, https://issues.apache.org/jira/browse/SPARK-28358 given my testing and investigation. 
  3. Users can use Python type hints with Pandas UDFs without thinking about Python version
  4. Users can leverage one latest cloudpickle, https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also leverage C pickle.
  5. ...

So it benefits both users and dev. WDYT guys?


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

Hyukjin Kwon
Thank you all. Python 2, 3.4 and 3.5 are dropped now in the master branch at https://github.com/apache/spark/pull/28957

2020년 7월 3일 (금) 오전 10:01, Hyukjin Kwon <[hidden email]>님이 작성:
Thanks Dongjoon. That makes much more sense now!

2020년 7월 3일 (금) 오전 12:11, Dongjoon Hyun <[hidden email]>님이 작성:
Thank you, Hyukjin.

According to the Python community, Python 3.5 is also EOF at 2020-09-13 (only two months left).

- https://www.python.org/downloads/

So, targeting live Python versions at Apache Spark 3.1.0 (December 2020) looks reasonable to me.

For old Python versions, we still have Apache Spark 2.4 LTS and also Apache Spark 3.0.x will work.

Bests,
Dongjoon.


On Wed, Jul 1, 2020 at 10:50 PM Yuanjian Li <[hidden email]> wrote:
+1, especially Python 2

Holden Karau <[hidden email]> 于2020年7月2日周四 上午10:20写道:
I’m ok with us dropping Python 2, 3.4, and 3.5 in Spark 3.1 forward. It will be exciting to get to use more recent Python features. The most recent Ubuntu LTS ships with 3.7, and while the previous LTS ships with 3.5, if folks really can’t upgrade there’s conda.

Is there anyone with a large Python 3.5 fleet who can’t use conda?

On Wed, Jul 1, 2020 at 7:15 PM Hyukjin Kwon <[hidden email]> wrote:
Yeah, sure. It will be dropped at Spark 3.1 onwards. I don't think we should make such changes in maintenance releases

2020년 7월 2일 (목) 오전 11:13, Holden Karau <[hidden email]>님이 작성:
To be clear the plan is to drop them in Spark 3.1 onwards, yes?

On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon <[hidden email]> wrote:
Hi all,

I would like to discuss dropping deprecated Python versions 2, 3.4 and 3.5 at https://github.com/apache/spark/pull/28957. I assume people support it in general
but I am writing this to make sure everybody is happy.

Fokko made a very good investigation on it, see https://github.com/apache/spark/pull/28957#issuecomment-652022449.
Assuming from the statistics, I think we're pretty safe to drop them.
Also note that dropping Python 2 was actually declared at https://python3statement.org/

Roughly speaking, there are many main advantages by dropping them:
  1. It removes a bunch of hacks we added around 700 lines in PySpark.
  2. PyPy2 has a critical bug that causes a flaky test, https://issues.apache.org/jira/browse/SPARK-28358 given my testing and investigation. 
  3. Users can use Python type hints with Pandas UDFs without thinking about Python version
  4. Users can leverage one latest cloudpickle, https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also leverage C pickle.
  5. ...

So it benefits both users and dev. WDYT guys?


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

Holden Karau
Awesome, thanks you for driving this forward :)

On Mon, Jul 13, 2020 at 7:25 PM Hyukjin Kwon <[hidden email]> wrote:
Thank you all. Python 2, 3.4 and 3.5 are dropped now in the master branch at https://github.com/apache/spark/pull/28957

2020년 7월 3일 (금) 오전 10:01, Hyukjin Kwon <[hidden email]>님이 작성:
Thanks Dongjoon. That makes much more sense now!

2020년 7월 3일 (금) 오전 12:11, Dongjoon Hyun <[hidden email]>님이 작성:
Thank you, Hyukjin.

According to the Python community, Python 3.5 is also EOF at 2020-09-13 (only two months left).

- https://www.python.org/downloads/

So, targeting live Python versions at Apache Spark 3.1.0 (December 2020) looks reasonable to me.

For old Python versions, we still have Apache Spark 2.4 LTS and also Apache Spark 3.0.x will work.

Bests,
Dongjoon.


On Wed, Jul 1, 2020 at 10:50 PM Yuanjian Li <[hidden email]> wrote:
+1, especially Python 2

Holden Karau <[hidden email]> 于2020年7月2日周四 上午10:20写道:
I’m ok with us dropping Python 2, 3.4, and 3.5 in Spark 3.1 forward. It will be exciting to get to use more recent Python features. The most recent Ubuntu LTS ships with 3.7, and while the previous LTS ships with 3.5, if folks really can’t upgrade there’s conda.

Is there anyone with a large Python 3.5 fleet who can’t use conda?

On Wed, Jul 1, 2020 at 7:15 PM Hyukjin Kwon <[hidden email]> wrote:
Yeah, sure. It will be dropped at Spark 3.1 onwards. I don't think we should make such changes in maintenance releases

2020년 7월 2일 (목) 오전 11:13, Holden Karau <[hidden email]>님이 작성:
To be clear the plan is to drop them in Spark 3.1 onwards, yes?

On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon <[hidden email]> wrote:
Hi all,

I would like to discuss dropping deprecated Python versions 2, 3.4 and 3.5 at https://github.com/apache/spark/pull/28957. I assume people support it in general
but I am writing this to make sure everybody is happy.

Fokko made a very good investigation on it, see https://github.com/apache/spark/pull/28957#issuecomment-652022449.
Assuming from the statistics, I think we're pretty safe to drop them.
Also note that dropping Python 2 was actually declared at https://python3statement.org/

Roughly speaking, there are many main advantages by dropping them:
  1. It removes a bunch of hacks we added around 700 lines in PySpark.
  2. PyPy2 has a critical bug that causes a flaky test, https://issues.apache.org/jira/browse/SPARK-28358 given my testing and investigation. 
  3. Users can use Python type hints with Pandas UDFs without thinking about Python version
  4. Users can leverage one latest cloudpickle, https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also leverage C pickle.
  5. ...

So it benefits both users and dev. WDYT guys?


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

shane knapp ☠
this is seriously great news!  let's all take a moment and welcome apache spark's python support to the present.  ;)

On Mon, Jul 13, 2020 at 7:26 PM Holden Karau <[hidden email]> wrote:
Awesome, thanks you for driving this forward :)

On Mon, Jul 13, 2020 at 7:25 PM Hyukjin Kwon <[hidden email]> wrote:
Thank you all. Python 2, 3.4 and 3.5 are dropped now in the master branch at https://github.com/apache/spark/pull/28957

2020년 7월 3일 (금) 오전 10:01, Hyukjin Kwon <[hidden email]>님이 작성:
Thanks Dongjoon. That makes much more sense now!

2020년 7월 3일 (금) 오전 12:11, Dongjoon Hyun <[hidden email]>님이 작성:
Thank you, Hyukjin.

According to the Python community, Python 3.5 is also EOF at 2020-09-13 (only two months left).

- https://www.python.org/downloads/

So, targeting live Python versions at Apache Spark 3.1.0 (December 2020) looks reasonable to me.

For old Python versions, we still have Apache Spark 2.4 LTS and also Apache Spark 3.0.x will work.

Bests,
Dongjoon.


On Wed, Jul 1, 2020 at 10:50 PM Yuanjian Li <[hidden email]> wrote:
+1, especially Python 2

Holden Karau <[hidden email]> 于2020年7月2日周四 上午10:20写道:
I’m ok with us dropping Python 2, 3.4, and 3.5 in Spark 3.1 forward. It will be exciting to get to use more recent Python features. The most recent Ubuntu LTS ships with 3.7, and while the previous LTS ships with 3.5, if folks really can’t upgrade there’s conda.

Is there anyone with a large Python 3.5 fleet who can’t use conda?

On Wed, Jul 1, 2020 at 7:15 PM Hyukjin Kwon <[hidden email]> wrote:
Yeah, sure. It will be dropped at Spark 3.1 onwards. I don't think we should make such changes in maintenance releases

2020년 7월 2일 (목) 오전 11:13, Holden Karau <[hidden email]>님이 작성:
To be clear the plan is to drop them in Spark 3.1 onwards, yes?

On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon <[hidden email]> wrote:
Hi all,

I would like to discuss dropping deprecated Python versions 2, 3.4 and 3.5 at https://github.com/apache/spark/pull/28957. I assume people support it in general
but I am writing this to make sure everybody is happy.

Fokko made a very good investigation on it, see https://github.com/apache/spark/pull/28957#issuecomment-652022449.
Assuming from the statistics, I think we're pretty safe to drop them.
Also note that dropping Python 2 was actually declared at https://python3statement.org/

Roughly speaking, there are many main advantages by dropping them:
  1. It removes a bunch of hacks we added around 700 lines in PySpark.
  2. PyPy2 has a critical bug that causes a flaky test, https://issues.apache.org/jira/browse/SPARK-28358 given my testing and investigation. 
  3. Users can use Python type hints with Pandas UDFs without thinking about Python version
  4. Users can leverage one latest cloudpickle, https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also leverage C pickle.
  5. ...

So it benefits both users and dev. WDYT guys?


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 


--
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

Holden Karau
I’m going to drink a celebratory afternoon coffee :)

On Tue, Jul 14, 2020 at 12:26 PM shane knapp ☠ <[hidden email]> wrote:
this is seriously great news!  let's all take a moment and welcome apache spark's python support to the present.  ;)

On Mon, Jul 13, 2020 at 7:26 PM Holden Karau <[hidden email]> wrote:
Awesome, thanks you for driving this forward :)

On Mon, Jul 13, 2020 at 7:25 PM Hyukjin Kwon <[hidden email]> wrote:
Thank you all. Python 2, 3.4 and 3.5 are dropped now in the master branch at https://github.com/apache/spark/pull/28957

2020년 7월 3일 (금) 오전 10:01, Hyukjin Kwon <[hidden email]>님이 작성:
Thanks Dongjoon. That makes much more sense now!

2020년 7월 3일 (금) 오전 12:11, Dongjoon Hyun <[hidden email]>님이 작성:
Thank you, Hyukjin.

According to the Python community, Python 3.5 is also EOF at 2020-09-13 (only two months left).

- https://www.python.org/downloads/

So, targeting live Python versions at Apache Spark 3.1.0 (December 2020) looks reasonable to me.

For old Python versions, we still have Apache Spark 2.4 LTS and also Apache Spark 3.0.x will work.

Bests,
Dongjoon.


On Wed, Jul 1, 2020 at 10:50 PM Yuanjian Li <[hidden email]> wrote:
+1, especially Python 2

Holden Karau <[hidden email]> 于2020年7月2日周四 上午10:20写道:
I’m ok with us dropping Python 2, 3.4, and 3.5 in Spark 3.1 forward. It will be exciting to get to use more recent Python features. The most recent Ubuntu LTS ships with 3.7, and while the previous LTS ships with 3.5, if folks really can’t upgrade there’s conda.

Is there anyone with a large Python 3.5 fleet who can’t use conda?

On Wed, Jul 1, 2020 at 7:15 PM Hyukjin Kwon <[hidden email]> wrote:
Yeah, sure. It will be dropped at Spark 3.1 onwards. I don't think we should make such changes in maintenance releases

2020년 7월 2일 (목) 오전 11:13, Holden Karau <[hidden email]>님이 작성:
To be clear the plan is to drop them in Spark 3.1 onwards, yes?

On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon <[hidden email]> wrote:
Hi all,

I would like to discuss dropping deprecated Python versions 2, 3.4 and 3.5 at https://github.com/apache/spark/pull/28957. I assume people support it in general
but I am writing this to make sure everybody is happy.

Fokko made a very good investigation on it, see https://github.com/apache/spark/pull/28957#issuecomment-652022449.
Assuming from the statistics, I think we're pretty safe to drop them.
Also note that dropping Python 2 was actually declared at https://python3statement.org/

Roughly speaking, there are many main advantages by dropping them:
  1. It removes a bunch of hacks we added around 700 lines in PySpark.
  2. PyPy2 has a critical bug that causes a flaky test, https://issues.apache.org/jira/browse/SPARK-28358 given my testing and investigation. 
  3. Users can use Python type hints with Pandas UDFs without thinking about Python version
  4. Users can leverage one latest cloudpickle, https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also leverage C pickle.
  5. ...

So it benefits both users and dev. WDYT guys?


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 


--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 


--
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu
--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9