Apache Spark 3.0.2 Release ?

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Apache Spark 3.0.2 Release ?

Dongjoon Hyun-2
Hi, All.

As of today, `branch-3.0` has 307 patches (including 25 correctness patches) since v3.0.1 tag (released on September 8th, 2020).

Since we stabilized branch-3.0 during 3.1.x preparation so far,
it would be great if we start to release Apache Spark 3.0.2 next week.
And, I'd like to volunteer for Apache Spark 3.0.2 release manager.

What do you think about the Apache Spark 3.0.2 release?

Bests,
Dongjoon.


-----<Correctness Patches>-----
SPARK-31511 Make BytesToBytesMap iterator() thread-safe
SPARK-32635 When pyspark.sql.functions.lit() function is used with dataframe cache, it returns wrong result
SPARK-32753 Deduplicating and repartitioning the same column create duplicate rows with AQE
SPARK-32764 compare of -0.0 < 0.0 return true
SPARK-32840 Invalid interval value can happen to be just adhesive with the unit
SPARK-32908 percentile_approx() returns incorrect results
SPARK-33019 Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default
SPARK-33183 Bug in optimizer rule EliminateSorts
SPARK-33260 SortExec produces incorrect results if sortOrder is a Stream
SPARK-33290 REFRESH TABLE should invalidate cache even though the table itself may not be cached
SPARK-33358 Spark SQL CLI command processing loop can't exit while one comand fail
SPARK-33404 "date_trunc" expression returns incorrect results
SPARK-33435 DSv2: REFRESH TABLE should invalidate caches
SPARK-33591 NULL is recognized as the "null" string in partition specs
SPARK-33593 Vector reader got incorrect data with binary partition value
SPARK-33726 Duplicate field names causes wrong answers during aggregation
SPARK-33950 ALTER TABLE .. DROP PARTITION doesn't refresh cache
SPARK-34011 ALTER TABLE .. RENAME TO PARTITION doesn't refresh cache
SPARK-34027 ALTER TABLE .. RECOVER PARTITIONS doesn't refresh cache
SPARK-34055 ALTER TABLE .. ADD PARTITION doesn't refresh cache
SPARK-34187 Use available offset range obtained during polling when checking offset validation
SPARK-34212 For parquet table, after changing the precision and scale of decimal type in hive, spark reads incorrect value
SPARK-34213 LOAD DATA doesn't refresh v1 table cache
SPARK-34229 Avro should read decimal values with the file schema
SPARK-34262 ALTER TABLE .. SET LOCATION doesn't refresh v1 table cache
Reply | Threaded
Open this post in threaded view
|

Re: Apache Spark 3.0.2 Release ?

Sean Owen-2
Sounds like a fine time to me, sure.

On Fri, Feb 12, 2021 at 1:39 PM Dongjoon Hyun <[hidden email]> wrote:
Hi, All.

As of today, `branch-3.0` has 307 patches (including 25 correctness patches) since v3.0.1 tag (released on September 8th, 2020).

Since we stabilized branch-3.0 during 3.1.x preparation so far,
it would be great if we start to release Apache Spark 3.0.2 next week.
And, I'd like to volunteer for Apache Spark 3.0.2 release manager.

What do you think about the Apache Spark 3.0.2 release?

Bests,
Dongjoon.


-----<Correctness Patches>-----
SPARK-31511 Make BytesToBytesMap iterator() thread-safe
SPARK-32635 When pyspark.sql.functions.lit() function is used with dataframe cache, it returns wrong result
SPARK-32753 Deduplicating and repartitioning the same column create duplicate rows with AQE
SPARK-32764 compare of -0.0 < 0.0 return true
SPARK-32840 Invalid interval value can happen to be just adhesive with the unit
SPARK-32908 percentile_approx() returns incorrect results
SPARK-33019 Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default
SPARK-33183 Bug in optimizer rule EliminateSorts
SPARK-33260 SortExec produces incorrect results if sortOrder is a Stream
SPARK-33290 REFRESH TABLE should invalidate cache even though the table itself may not be cached
SPARK-33358 Spark SQL CLI command processing loop can't exit while one comand fail
SPARK-33404 "date_trunc" expression returns incorrect results
SPARK-33435 DSv2: REFRESH TABLE should invalidate caches
SPARK-33591 NULL is recognized as the "null" string in partition specs
SPARK-33593 Vector reader got incorrect data with binary partition value
SPARK-33726 Duplicate field names causes wrong answers during aggregation
SPARK-33950 ALTER TABLE .. DROP PARTITION doesn't refresh cache
SPARK-34011 ALTER TABLE .. RENAME TO PARTITION doesn't refresh cache
SPARK-34027 ALTER TABLE .. RECOVER PARTITIONS doesn't refresh cache
SPARK-34055 ALTER TABLE .. ADD PARTITION doesn't refresh cache
SPARK-34187 Use available offset range obtained during polling when checking offset validation
SPARK-34212 For parquet table, after changing the precision and scale of decimal type in hive, spark reads incorrect value
SPARK-34213 LOAD DATA doesn't refresh v1 table cache
SPARK-34229 Avro should read decimal values with the file schema
SPARK-34262 ALTER TABLE .. SET LOCATION doesn't refresh v1 table cache
Reply | Threaded
Open this post in threaded view
|

Re: Apache Spark 3.0.2 Release ?

Dongjoon Hyun-2
Thank you, Sean!

On Fri, Feb 12, 2021 at 11:41 AM Sean Owen <[hidden email]> wrote:
Sounds like a fine time to me, sure.

On Fri, Feb 12, 2021 at 1:39 PM Dongjoon Hyun <[hidden email]> wrote:
Hi, All.

As of today, `branch-3.0` has 307 patches (including 25 correctness patches) since v3.0.1 tag (released on September 8th, 2020).

Since we stabilized branch-3.0 during 3.1.x preparation so far,
it would be great if we start to release Apache Spark 3.0.2 next week.
And, I'd like to volunteer for Apache Spark 3.0.2 release manager.

What do you think about the Apache Spark 3.0.2 release?

Bests,
Dongjoon.


-----<Correctness Patches>-----
SPARK-31511 Make BytesToBytesMap iterator() thread-safe
SPARK-32635 When pyspark.sql.functions.lit() function is used with dataframe cache, it returns wrong result
SPARK-32753 Deduplicating and repartitioning the same column create duplicate rows with AQE
SPARK-32764 compare of -0.0 < 0.0 return true
SPARK-32840 Invalid interval value can happen to be just adhesive with the unit
SPARK-32908 percentile_approx() returns incorrect results
SPARK-33019 Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default
SPARK-33183 Bug in optimizer rule EliminateSorts
SPARK-33260 SortExec produces incorrect results if sortOrder is a Stream
SPARK-33290 REFRESH TABLE should invalidate cache even though the table itself may not be cached
SPARK-33358 Spark SQL CLI command processing loop can't exit while one comand fail
SPARK-33404 "date_trunc" expression returns incorrect results
SPARK-33435 DSv2: REFRESH TABLE should invalidate caches
SPARK-33591 NULL is recognized as the "null" string in partition specs
SPARK-33593 Vector reader got incorrect data with binary partition value
SPARK-33726 Duplicate field names causes wrong answers during aggregation
SPARK-33950 ALTER TABLE .. DROP PARTITION doesn't refresh cache
SPARK-34011 ALTER TABLE .. RENAME TO PARTITION doesn't refresh cache
SPARK-34027 ALTER TABLE .. RECOVER PARTITIONS doesn't refresh cache
SPARK-34055 ALTER TABLE .. ADD PARTITION doesn't refresh cache
SPARK-34187 Use available offset range obtained during polling when checking offset validation
SPARK-34212 For parquet table, after changing the precision and scale of decimal type in hive, spark reads incorrect value
SPARK-34213 LOAD DATA doesn't refresh v1 table cache
SPARK-34229 Avro should read decimal values with the file schema
SPARK-34262 ALTER TABLE .. SET LOCATION doesn't refresh v1 table cache
Reply | Threaded
Open this post in threaded view
|

Re: Apache Spark 3.0.2 Release ?

Hyukjin Kwon
Yeah, +1 too

2021년 2월 13일 (토) 오전 4:49, Dongjoon Hyun <[hidden email]>님이 작성:
Thank you, Sean!

On Fri, Feb 12, 2021 at 11:41 AM Sean Owen <[hidden email]> wrote:
Sounds like a fine time to me, sure.

On Fri, Feb 12, 2021 at 1:39 PM Dongjoon Hyun <[hidden email]> wrote:
Hi, All.

As of today, `branch-3.0` has 307 patches (including 25 correctness patches) since v3.0.1 tag (released on September 8th, 2020).

Since we stabilized branch-3.0 during 3.1.x preparation so far,
it would be great if we start to release Apache Spark 3.0.2 next week.
And, I'd like to volunteer for Apache Spark 3.0.2 release manager.

What do you think about the Apache Spark 3.0.2 release?

Bests,
Dongjoon.


-----<Correctness Patches>-----
SPARK-31511 Make BytesToBytesMap iterator() thread-safe
SPARK-32635 When pyspark.sql.functions.lit() function is used with dataframe cache, it returns wrong result
SPARK-32753 Deduplicating and repartitioning the same column create duplicate rows with AQE
SPARK-32764 compare of -0.0 < 0.0 return true
SPARK-32840 Invalid interval value can happen to be just adhesive with the unit
SPARK-32908 percentile_approx() returns incorrect results
SPARK-33019 Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default
SPARK-33183 Bug in optimizer rule EliminateSorts
SPARK-33260 SortExec produces incorrect results if sortOrder is a Stream
SPARK-33290 REFRESH TABLE should invalidate cache even though the table itself may not be cached
SPARK-33358 Spark SQL CLI command processing loop can't exit while one comand fail
SPARK-33404 "date_trunc" expression returns incorrect results
SPARK-33435 DSv2: REFRESH TABLE should invalidate caches
SPARK-33591 NULL is recognized as the "null" string in partition specs
SPARK-33593 Vector reader got incorrect data with binary partition value
SPARK-33726 Duplicate field names causes wrong answers during aggregation
SPARK-33950 ALTER TABLE .. DROP PARTITION doesn't refresh cache
SPARK-34011 ALTER TABLE .. RENAME TO PARTITION doesn't refresh cache
SPARK-34027 ALTER TABLE .. RECOVER PARTITIONS doesn't refresh cache
SPARK-34055 ALTER TABLE .. ADD PARTITION doesn't refresh cache
SPARK-34187 Use available offset range obtained during polling when checking offset validation
SPARK-34212 For parquet table, after changing the precision and scale of decimal type in hive, spark reads incorrect value
SPARK-34213 LOAD DATA doesn't refresh v1 table cache
SPARK-34229 Avro should read decimal values with the file schema
SPARK-34262 ALTER TABLE .. SET LOCATION doesn't refresh v1 table cache
Reply | Threaded
Open this post in threaded view
|

Re: Apache Spark 3.0.2 Release ?

Xiao Li-2
+1 

Happy Lunar New Year!

Xiao

On Fri, Feb 12, 2021 at 5:33 PM Hyukjin Kwon <[hidden email]> wrote:
Yeah, +1 too

2021년 2월 13일 (토) 오전 4:49, Dongjoon Hyun <[hidden email]>님이 작성:
Thank you, Sean!

On Fri, Feb 12, 2021 at 11:41 AM Sean Owen <[hidden email]> wrote:
Sounds like a fine time to me, sure.

On Fri, Feb 12, 2021 at 1:39 PM Dongjoon Hyun <[hidden email]> wrote:
Hi, All.

As of today, `branch-3.0` has 307 patches (including 25 correctness patches) since v3.0.1 tag (released on September 8th, 2020).

Since we stabilized branch-3.0 during 3.1.x preparation so far,
it would be great if we start to release Apache Spark 3.0.2 next week.
And, I'd like to volunteer for Apache Spark 3.0.2 release manager.

What do you think about the Apache Spark 3.0.2 release?

Bests,
Dongjoon.


-----<Correctness Patches>-----
SPARK-31511 Make BytesToBytesMap iterator() thread-safe
SPARK-32635 When pyspark.sql.functions.lit() function is used with dataframe cache, it returns wrong result
SPARK-32753 Deduplicating and repartitioning the same column create duplicate rows with AQE
SPARK-32764 compare of -0.0 < 0.0 return true
SPARK-32840 Invalid interval value can happen to be just adhesive with the unit
SPARK-32908 percentile_approx() returns incorrect results
SPARK-33019 Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default
SPARK-33183 Bug in optimizer rule EliminateSorts
SPARK-33260 SortExec produces incorrect results if sortOrder is a Stream
SPARK-33290 REFRESH TABLE should invalidate cache even though the table itself may not be cached
SPARK-33358 Spark SQL CLI command processing loop can't exit while one comand fail
SPARK-33404 "date_trunc" expression returns incorrect results
SPARK-33435 DSv2: REFRESH TABLE should invalidate caches
SPARK-33591 NULL is recognized as the "null" string in partition specs
SPARK-33593 Vector reader got incorrect data with binary partition value
SPARK-33726 Duplicate field names causes wrong answers during aggregation
SPARK-33950 ALTER TABLE .. DROP PARTITION doesn't refresh cache
SPARK-34011 ALTER TABLE .. RENAME TO PARTITION doesn't refresh cache
SPARK-34027 ALTER TABLE .. RECOVER PARTITIONS doesn't refresh cache
SPARK-34055 ALTER TABLE .. ADD PARTITION doesn't refresh cache
SPARK-34187 Use available offset range obtained during polling when checking offset validation
SPARK-34212 For parquet table, after changing the precision and scale of decimal type in hive, spark reads incorrect value
SPARK-34213 LOAD DATA doesn't refresh v1 table cache
SPARK-34229 Avro should read decimal values with the file schema
SPARK-34262 ALTER TABLE .. SET LOCATION doesn't refresh v1 table cache


--

Reply | Threaded
Open this post in threaded view
|

Re: Apache Spark 3.0.2 Release ?

Takeshi Yamamuro
+1, too. Thanks, Dongjoon!

2021/02/13 11:07、Xiao Li <[hidden email]>のメール:


+1 

Happy Lunar New Year!

Xiao

On Fri, Feb 12, 2021 at 5:33 PM Hyukjin Kwon <[hidden email]> wrote:
Yeah, +1 too

2021년 2월 13일 (토) 오전 4:49, Dongjoon Hyun <[hidden email]>님이 작성:
Thank you, Sean!

On Fri, Feb 12, 2021 at 11:41 AM Sean Owen <[hidden email]> wrote:
Sounds like a fine time to me, sure.

On Fri, Feb 12, 2021 at 1:39 PM Dongjoon Hyun <[hidden email]> wrote:
Hi, All.

As of today, `branch-3.0` has 307 patches (including 25 correctness patches) since v3.0.1 tag (released on September 8th, 2020).

Since we stabilized branch-3.0 during 3.1.x preparation so far,
it would be great if we start to release Apache Spark 3.0.2 next week.
And, I'd like to volunteer for Apache Spark 3.0.2 release manager.

What do you think about the Apache Spark 3.0.2 release?

Bests,
Dongjoon.


-----<Correctness Patches>-----
SPARK-31511 Make BytesToBytesMap iterator() thread-safe
SPARK-32635 When pyspark.sql.functions.lit() function is used with dataframe cache, it returns wrong result
SPARK-32753 Deduplicating and repartitioning the same column create duplicate rows with AQE
SPARK-32764 compare of -0.0 < 0.0 return true
SPARK-32840 Invalid interval value can happen to be just adhesive with the unit
SPARK-32908 percentile_approx() returns incorrect results
SPARK-33019 Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default
SPARK-33183 Bug in optimizer rule EliminateSorts
SPARK-33260 SortExec produces incorrect results if sortOrder is a Stream
SPARK-33290 REFRESH TABLE should invalidate cache even though the table itself may not be cached
SPARK-33358 Spark SQL CLI command processing loop can't exit while one comand fail
SPARK-33404 "date_trunc" expression returns incorrect results
SPARK-33435 DSv2: REFRESH TABLE should invalidate caches
SPARK-33591 NULL is recognized as the "null" string in partition specs
SPARK-33593 Vector reader got incorrect data with binary partition value
SPARK-33726 Duplicate field names causes wrong answers during aggregation
SPARK-33950 ALTER TABLE .. DROP PARTITION doesn't refresh cache
SPARK-34011 ALTER TABLE .. RENAME TO PARTITION doesn't refresh cache
SPARK-34027 ALTER TABLE .. RECOVER PARTITIONS doesn't refresh cache
SPARK-34055 ALTER TABLE .. ADD PARTITION doesn't refresh cache
SPARK-34187 Use available offset range obtained during polling when checking offset validation
SPARK-34212 For parquet table, after changing the precision and scale of decimal type in hive, spark reads incorrect value
SPARK-34213 LOAD DATA doesn't refresh v1 table cache
SPARK-34229 Avro should read decimal values with the file schema
SPARK-34262 ALTER TABLE .. SET LOCATION doesn't refresh v1 table cache


--

Reply | Threaded
Open this post in threaded view
|

Re: Apache Spark 3.0.2 Release ?

Yuming Wang
+1.

On Sat, Feb 13, 2021 at 10:38 AM Takeshi Yamamuro <[hidden email]> wrote:
+1, too. Thanks, Dongjoon!

2021/02/13 11:07、Xiao Li <[hidden email]>のメール:


+1 

Happy Lunar New Year!

Xiao

On Fri, Feb 12, 2021 at 5:33 PM Hyukjin Kwon <[hidden email]> wrote:
Yeah, +1 too

2021년 2월 13일 (토) 오전 4:49, Dongjoon Hyun <[hidden email]>님이 작성:
Thank you, Sean!

On Fri, Feb 12, 2021 at 11:41 AM Sean Owen <[hidden email]> wrote:
Sounds like a fine time to me, sure.

On Fri, Feb 12, 2021 at 1:39 PM Dongjoon Hyun <[hidden email]> wrote:
Hi, All.

As of today, `branch-3.0` has 307 patches (including 25 correctness patches) since v3.0.1 tag (released on September 8th, 2020).

Since we stabilized branch-3.0 during 3.1.x preparation so far,
it would be great if we start to release Apache Spark 3.0.2 next week.
And, I'd like to volunteer for Apache Spark 3.0.2 release manager.

What do you think about the Apache Spark 3.0.2 release?

Bests,
Dongjoon.


-----<Correctness Patches>-----
SPARK-31511 Make BytesToBytesMap iterator() thread-safe
SPARK-32635 When pyspark.sql.functions.lit() function is used with dataframe cache, it returns wrong result
SPARK-32753 Deduplicating and repartitioning the same column create duplicate rows with AQE
SPARK-32764 compare of -0.0 < 0.0 return true
SPARK-32840 Invalid interval value can happen to be just adhesive with the unit
SPARK-32908 percentile_approx() returns incorrect results
SPARK-33019 Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default
SPARK-33183 Bug in optimizer rule EliminateSorts
SPARK-33260 SortExec produces incorrect results if sortOrder is a Stream
SPARK-33290 REFRESH TABLE should invalidate cache even though the table itself may not be cached
SPARK-33358 Spark SQL CLI command processing loop can't exit while one comand fail
SPARK-33404 "date_trunc" expression returns incorrect results
SPARK-33435 DSv2: REFRESH TABLE should invalidate caches
SPARK-33591 NULL is recognized as the "null" string in partition specs
SPARK-33593 Vector reader got incorrect data with binary partition value
SPARK-33726 Duplicate field names causes wrong answers during aggregation
SPARK-33950 ALTER TABLE .. DROP PARTITION doesn't refresh cache
SPARK-34011 ALTER TABLE .. RENAME TO PARTITION doesn't refresh cache
SPARK-34027 ALTER TABLE .. RECOVER PARTITIONS doesn't refresh cache
SPARK-34055 ALTER TABLE .. ADD PARTITION doesn't refresh cache
SPARK-34187 Use available offset range obtained during polling when checking offset validation
SPARK-34212 For parquet table, after changing the precision and scale of decimal type in hive, spark reads incorrect value
SPARK-34213 LOAD DATA doesn't refresh v1 table cache
SPARK-34229 Avro should read decimal values with the file schema
SPARK-34262 ALTER TABLE .. SET LOCATION doesn't refresh v1 table cache


--

Reply | Threaded
Open this post in threaded view
|

Re: Apache Spark 3.0.2 Release ?

Holden Karau
+1, great idea.

On Fri, Feb 12, 2021 at 6:40 PM Yuming Wang <[hidden email]> wrote:
+1.

On Sat, Feb 13, 2021 at 10:38 AM Takeshi Yamamuro <[hidden email]> wrote:
+1, too. Thanks, Dongjoon!

2021/02/13 11:07、Xiao Li <[hidden email]>のメール:


+1 

Happy Lunar New Year!

Xiao

On Fri, Feb 12, 2021 at 5:33 PM Hyukjin Kwon <[hidden email]> wrote:
Yeah, +1 too

2021년 2월 13일 (토) 오전 4:49, Dongjoon Hyun <[hidden email]>님이 작성:
Thank you, Sean!

On Fri, Feb 12, 2021 at 11:41 AM Sean Owen <[hidden email]> wrote:
Sounds like a fine time to me, sure.

On Fri, Feb 12, 2021 at 1:39 PM Dongjoon Hyun <[hidden email]> wrote:
Hi, All.

As of today, `branch-3.0` has 307 patches (including 25 correctness patches) since v3.0.1 tag (released on September 8th, 2020).

Since we stabilized branch-3.0 during 3.1.x preparation so far,
it would be great if we start to release Apache Spark 3.0.2 next week.
And, I'd like to volunteer for Apache Spark 3.0.2 release manager.

What do you think about the Apache Spark 3.0.2 release?

Bests,
Dongjoon.


-----<Correctness Patches>-----
SPARK-31511 Make BytesToBytesMap iterator() thread-safe
SPARK-32635 When pyspark.sql.functions.lit() function is used with dataframe cache, it returns wrong result
SPARK-32753 Deduplicating and repartitioning the same column create duplicate rows with AQE
SPARK-32764 compare of -0.0 < 0.0 return true
SPARK-32840 Invalid interval value can happen to be just adhesive with the unit
SPARK-32908 percentile_approx() returns incorrect results
SPARK-33019 Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default
SPARK-33183 Bug in optimizer rule EliminateSorts
SPARK-33260 SortExec produces incorrect results if sortOrder is a Stream
SPARK-33290 REFRESH TABLE should invalidate cache even though the table itself may not be cached
SPARK-33358 Spark SQL CLI command processing loop can't exit while one comand fail
SPARK-33404 "date_trunc" expression returns incorrect results
SPARK-33435 DSv2: REFRESH TABLE should invalidate caches
SPARK-33591 NULL is recognized as the "null" string in partition specs
SPARK-33593 Vector reader got incorrect data with binary partition value
SPARK-33726 Duplicate field names causes wrong answers during aggregation
SPARK-33950 ALTER TABLE .. DROP PARTITION doesn't refresh cache
SPARK-34011 ALTER TABLE .. RENAME TO PARTITION doesn't refresh cache
SPARK-34027 ALTER TABLE .. RECOVER PARTITIONS doesn't refresh cache
SPARK-34055 ALTER TABLE .. ADD PARTITION doesn't refresh cache
SPARK-34187 Use available offset range obtained during polling when checking offset validation
SPARK-34212 For parquet table, after changing the precision and scale of decimal type in hive, spark reads incorrect value
SPARK-34213 LOAD DATA doesn't refresh v1 table cache
SPARK-34229 Avro should read decimal values with the file schema
SPARK-34262 ALTER TABLE .. SET LOCATION doesn't refresh v1 table cache


--

--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
Reply | Threaded
Open this post in threaded view
|

Re: Apache Spark 3.0.2 Release ?

John Zhuge
+1

On Sat, Feb 13, 2021 at 9:13 AM Holden Karau <[hidden email]> wrote:
+1, great idea.

On Fri, Feb 12, 2021 at 6:40 PM Yuming Wang <[hidden email]> wrote:
+1.

On Sat, Feb 13, 2021 at 10:38 AM Takeshi Yamamuro <[hidden email]> wrote:
+1, too. Thanks, Dongjoon!

2021/02/13 11:07、Xiao Li <[hidden email]>のメール:


+1 

Happy Lunar New Year!

Xiao

On Fri, Feb 12, 2021 at 5:33 PM Hyukjin Kwon <[hidden email]> wrote:
Yeah, +1 too

2021년 2월 13일 (토) 오전 4:49, Dongjoon Hyun <[hidden email]>님이 작성:
Thank you, Sean!

On Fri, Feb 12, 2021 at 11:41 AM Sean Owen <[hidden email]> wrote:
Sounds like a fine time to me, sure.

On Fri, Feb 12, 2021 at 1:39 PM Dongjoon Hyun <[hidden email]> wrote:
Hi, All.

As of today, `branch-3.0` has 307 patches (including 25 correctness patches) since v3.0.1 tag (released on September 8th, 2020).

Since we stabilized branch-3.0 during 3.1.x preparation so far,
it would be great if we start to release Apache Spark 3.0.2 next week.
And, I'd like to volunteer for Apache Spark 3.0.2 release manager.

What do you think about the Apache Spark 3.0.2 release?

Bests,
Dongjoon.


-----<Correctness Patches>-----
SPARK-31511 Make BytesToBytesMap iterator() thread-safe
SPARK-32635 When pyspark.sql.functions.lit() function is used with dataframe cache, it returns wrong result
SPARK-32753 Deduplicating and repartitioning the same column create duplicate rows with AQE
SPARK-32764 compare of -0.0 < 0.0 return true
SPARK-32840 Invalid interval value can happen to be just adhesive with the unit
SPARK-32908 percentile_approx() returns incorrect results
SPARK-33019 Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default
SPARK-33183 Bug in optimizer rule EliminateSorts
SPARK-33260 SortExec produces incorrect results if sortOrder is a Stream
SPARK-33290 REFRESH TABLE should invalidate cache even though the table itself may not be cached
SPARK-33358 Spark SQL CLI command processing loop can't exit while one comand fail
SPARK-33404 "date_trunc" expression returns incorrect results
SPARK-33435 DSv2: REFRESH TABLE should invalidate caches
SPARK-33591 NULL is recognized as the "null" string in partition specs
SPARK-33593 Vector reader got incorrect data with binary partition value
SPARK-33726 Duplicate field names causes wrong answers during aggregation
SPARK-33950 ALTER TABLE .. DROP PARTITION doesn't refresh cache
SPARK-34011 ALTER TABLE .. RENAME TO PARTITION doesn't refresh cache
SPARK-34027 ALTER TABLE .. RECOVER PARTITIONS doesn't refresh cache
SPARK-34055 ALTER TABLE .. ADD PARTITION doesn't refresh cache
SPARK-34187 Use available offset range obtained during polling when checking offset validation
SPARK-34212 For parquet table, after changing the precision and scale of decimal type in hive, spark reads incorrect value
SPARK-34213 LOAD DATA doesn't refresh v1 table cache
SPARK-34229 Avro should read decimal values with the file schema
SPARK-34262 ALTER TABLE .. SET LOCATION doesn't refresh v1 table cache


--

--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 


--
John Zhuge
Reply | Threaded
Open this post in threaded view
|

Re: Apache Spark 3.0.2 Release ?

Dongjoon Hyun-2
Thank you all! I'm preparing 3.0.2 RC1 now.

Bests,
Dongjoon.


On Sat, Feb 13, 2021 at 9:16 PM John Zhuge <[hidden email]> wrote:
+1

On Sat, Feb 13, 2021 at 9:13 AM Holden Karau <[hidden email]> wrote:
+1, great idea.

On Fri, Feb 12, 2021 at 6:40 PM Yuming Wang <[hidden email]> wrote:
+1.

On Sat, Feb 13, 2021 at 10:38 AM Takeshi Yamamuro <[hidden email]> wrote:
+1, too. Thanks, Dongjoon!

2021/02/13 11:07、Xiao Li <[hidden email]>のメール:


+1 

Happy Lunar New Year!

Xiao

On Fri, Feb 12, 2021 at 5:33 PM Hyukjin Kwon <[hidden email]> wrote:
Yeah, +1 too

2021년 2월 13일 (토) 오전 4:49, Dongjoon Hyun <[hidden email]>님이 작성:
Thank you, Sean!

On Fri, Feb 12, 2021 at 11:41 AM Sean Owen <[hidden email]> wrote:
Sounds like a fine time to me, sure.

On Fri, Feb 12, 2021 at 1:39 PM Dongjoon Hyun <[hidden email]> wrote:
Hi, All.

As of today, `branch-3.0` has 307 patches (including 25 correctness patches) since v3.0.1 tag (released on September 8th, 2020).

Since we stabilized branch-3.0 during 3.1.x preparation so far,
it would be great if we start to release Apache Spark 3.0.2 next week.
And, I'd like to volunteer for Apache Spark 3.0.2 release manager.

What do you think about the Apache Spark 3.0.2 release?

Bests,
Dongjoon.


-----<Correctness Patches>-----
SPARK-31511 Make BytesToBytesMap iterator() thread-safe
SPARK-32635 When pyspark.sql.functions.lit() function is used with dataframe cache, it returns wrong result
SPARK-32753 Deduplicating and repartitioning the same column create duplicate rows with AQE
SPARK-32764 compare of -0.0 < 0.0 return true
SPARK-32840 Invalid interval value can happen to be just adhesive with the unit
SPARK-32908 percentile_approx() returns incorrect results
SPARK-33019 Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default
SPARK-33183 Bug in optimizer rule EliminateSorts
SPARK-33260 SortExec produces incorrect results if sortOrder is a Stream
SPARK-33290 REFRESH TABLE should invalidate cache even though the table itself may not be cached
SPARK-33358 Spark SQL CLI command processing loop can't exit while one comand fail
SPARK-33404 "date_trunc" expression returns incorrect results
SPARK-33435 DSv2: REFRESH TABLE should invalidate caches
SPARK-33591 NULL is recognized as the "null" string in partition specs
SPARK-33593 Vector reader got incorrect data with binary partition value
SPARK-33726 Duplicate field names causes wrong answers during aggregation
SPARK-33950 ALTER TABLE .. DROP PARTITION doesn't refresh cache
SPARK-34011 ALTER TABLE .. RENAME TO PARTITION doesn't refresh cache
SPARK-34027 ALTER TABLE .. RECOVER PARTITIONS doesn't refresh cache
SPARK-34055 ALTER TABLE .. ADD PARTITION doesn't refresh cache
SPARK-34187 Use available offset range obtained during polling when checking offset validation
SPARK-34212 For parquet table, after changing the precision and scale of decimal type in hive, spark reads incorrect value
SPARK-34213 LOAD DATA doesn't refresh v1 table cache
SPARK-34229 Avro should read decimal values with the file schema
SPARK-34262 ALTER TABLE .. SET LOCATION doesn't refresh v1 table cache


--

--
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 


--
John Zhuge