Spark 3.0 preview release 2?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark 3.0 preview release 2?

Xiao Li

I got many great feedbacks from the community about the recent 3.0 preview release. Since the last 3.0 preview release, we already have 353 commits [https://github.com/apache/spark/compare/v3.0.0-preview...master]. There are various important features and behavior changes we want the community to try before entering the official release candidates of Spark 3.0. 


Below is my selected items that are not part of the last 3.0 preview but already available in the upstream master branch: 

  • Support JDK 11 with Hadoop 2.7
  • Spark SQL will respect its own default format (i.e., parquet) when users do CREATE TABLE without USING or STORED AS clauses
  • Enable Parquet nested schema pruning and nested pruning on expressions by default
  • Add observable Metrics for Streaming queries
  • Column pruning through nondeterministic expressions
  • RecordBinaryComparator should check endianness when compared by long 
  • Improve parallelism for local shuffle reader in adaptive query execution
  • Upgrade Apache Arrow to version 0.15.1
  • Various interval-related SQL support
  • Add a mode to pin Python thread into JVM's
  • Provide option to clean up completed files in streaming query

I am wondering if we can have another preview release for Spark 3.0? This can help us find the design/API defects as early as possible and avoid the significant delay of the upcoming Spark 3.0 release


Also, any committer is willing to volunteer as the release manager of the next preview release of Spark 3.0, if we have such a release? 


Cheers,


Xiao

Reply | Threaded
Open this post in threaded view
|

Re: Spark 3.0 preview release 2?

rxin
If the cost is low, why don't we just do monthly previews until we code freeze? If it is high, maybe we should discuss and do it when there are people that volunteer ....


On Sun, Dec 08, 2019 at 10:32 PM, Xiao Li <[hidden email]> wrote:

I got many great feedbacks from the community about the recent 3.0 preview release. Since the last 3.0 preview release, we already have 353 commits [https://github.com/apache/spark/compare/v3.0.0-preview...master]. There are various important features and behavior changes we want the community to try before entering the official release candidates of Spark 3.0. 


Below is my selected items that are not part of the last 3.0 preview but already available in the upstream master branch: 

  • Support JDK 11 with Hadoop 2.7
  • Spark SQL will respect its own default format (i.e., parquet) when users do CREATE TABLE without USING or STORED AS clauses
  • Enable Parquet nested schema pruning and nested pruning on expressions by default
  • Add observable Metrics for Streaming queries
  • Column pruning through nondeterministic expressions
  • RecordBinaryComparator should check endianness when compared by long 
  • Improve parallelism for local shuffle reader in adaptive query execution
  • Upgrade Apache Arrow to version 0.15.1
  • Various interval-related SQL support
  • Add a mode to pin Python thread into JVM's
  • Provide option to clean up completed files in streaming query

I am wondering if we can have another preview release for Spark 3.0? This can help us find the design/API defects as early as possible and avoid the significant delay of the upcoming Spark 3.0 release


Also, any committer is willing to volunteer as the release manager of the next preview release of Spark 3.0, if we have such a release? 


Cheers,


Xiao


Reply | Threaded
Open this post in threaded view
|

Re: Spark 3.0 preview release 2?

Yuming Wang

I'd like to volunteer as the release manager for the next Spark 3.0 preview.


On Mon, Dec 9, 2019 at 2:34 PM Reynold Xin <[hidden email]> wrote:
If the cost is low, why don't we just do monthly previews until we code freeze? If it is high, maybe we should discuss and do it when there are people that volunteer ....


On Sun, Dec 08, 2019 at 10:32 PM, Xiao Li <[hidden email]> wrote:

I got many great feedbacks from the community about the recent 3.0 preview release. Since the last 3.0 preview release, we already have 353 commits [https://github.com/apache/spark/compare/v3.0.0-preview...master]. There are various important features and behavior changes we want the community to try before entering the official release candidates of Spark 3.0. 


Below is my selected items that are not part of the last 3.0 preview but already available in the upstream master branch: 

  • Support JDK 11 with Hadoop 2.7
  • Spark SQL will respect its own default format (i.e., parquet) when users do CREATE TABLE without USING or STORED AS clauses
  • Enable Parquet nested schema pruning and nested pruning on expressions by default
  • Add observable Metrics for Streaming queries
  • Column pruning through nondeterministic expressions
  • RecordBinaryComparator should check endianness when compared by long 
  • Improve parallelism for local shuffle reader in adaptive query execution
  • Upgrade Apache Arrow to version 0.15.1
  • Various interval-related SQL support
  • Add a mode to pin Python thread into JVM's
  • Provide option to clean up completed files in streaming query

I am wondering if we can have another preview release for Spark 3.0? This can help us find the design/API defects as early as possible and avoid the significant delay of the upcoming Spark 3.0 release


Also, any committer is willing to volunteer as the release manager of the next preview release of Spark 3.0, if we have such a release? 


Cheers,


Xiao


Reply | Threaded
Open this post in threaded view
|

Re: Spark 3.0 preview release 2?

Sean Owen-2
In reply to this post by Xiao Li
Seems fine to me of course. Honestly that wouldn't be a bad result for
a release candidate, though we would probably roll another one now.
How about simply moving to a release candidate? If not now then at
least move to code freeze from the start of 2020. There is also some
downside in pushing out the 3.0 release further with previews.

On Mon, Dec 9, 2019 at 12:32 AM Xiao Li <[hidden email]> wrote:

>
> I got many great feedbacks from the community about the recent 3.0 preview release. Since the last 3.0 preview release, we already have 353 commits [https://github.com/apache/spark/compare/v3.0.0-preview...master]. There are various important features and behavior changes we want the community to try before entering the official release candidates of Spark 3.0.
>
>
> Below is my selected items that are not part of the last 3.0 preview but already available in the upstream master branch:
>
> Support JDK 11 with Hadoop 2.7
> Spark SQL will respect its own default format (i.e., parquet) when users do CREATE TABLE without USING or STORED AS clauses
> Enable Parquet nested schema pruning and nested pruning on expressions by default
> Add observable Metrics for Streaming queries
> Column pruning through nondeterministic expressions
> RecordBinaryComparator should check endianness when compared by long
> Improve parallelism for local shuffle reader in adaptive query execution
> Upgrade Apache Arrow to version 0.15.1
> Various interval-related SQL support
> Add a mode to pin Python thread into JVM's
> Provide option to clean up completed files in streaming query
>
> I am wondering if we can have another preview release for Spark 3.0? This can help us find the design/API defects as early as possible and avoid the significant delay of the upcoming Spark 3.0 release
>
>
> Also, any committer is willing to volunteer as the release manager of the next preview release of Spark 3.0, if we have such a release?
>
>
> Cheers,
>
>
> Xiao

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark 3.0 preview release 2?

Xiao Li-2
When entering the official release candidates, the new features have to be disabled or even reverted [if the conf is not available] if the fixes are not trivial; otherwise, we might need 10+ RCs to make the final release. The new features should not block the release based on the previous discussions. 

I agree we should have code freeze at the beginning of 2020. The preview releases should not block the official releases. The preview is just to collect more feedback about these new features or behavior changes.

Also, for the release of Spark 3.0, we still need the Hive community to do us a favor to release 2.3.7 for having HIVE-22190. Before asking Hive community to do 2.3.7 release, if possible, we want our Spark community to have more tries, especially the support of JDK 11 on Hadoop 2.7 and 3.2, which is based on Hive 2.3 execution JAR. During the preview stage, we might find more issues that are not covered by our test cases.

 

On Mon, Dec 9, 2019 at 4:55 AM Sean Owen <[hidden email]> wrote:
Seems fine to me of course. Honestly that wouldn't be a bad result for
a release candidate, though we would probably roll another one now.
How about simply moving to a release candidate? If not now then at
least move to code freeze from the start of 2020. There is also some
downside in pushing out the 3.0 release further with previews.

On Mon, Dec 9, 2019 at 12:32 AM Xiao Li <[hidden email]> wrote:
>
> I got many great feedbacks from the community about the recent 3.0 preview release. Since the last 3.0 preview release, we already have 353 commits [https://github.com/apache/spark/compare/v3.0.0-preview...master]. There are various important features and behavior changes we want the community to try before entering the official release candidates of Spark 3.0.
>
>
> Below is my selected items that are not part of the last 3.0 preview but already available in the upstream master branch:
>
> Support JDK 11 with Hadoop 2.7
> Spark SQL will respect its own default format (i.e., parquet) when users do CREATE TABLE without USING or STORED AS clauses
> Enable Parquet nested schema pruning and nested pruning on expressions by default
> Add observable Metrics for Streaming queries
> Column pruning through nondeterministic expressions
> RecordBinaryComparator should check endianness when compared by long
> Improve parallelism for local shuffle reader in adaptive query execution
> Upgrade Apache Arrow to version 0.15.1
> Various interval-related SQL support
> Add a mode to pin Python thread into JVM's
> Provide option to clean up completed files in streaming query
>
> I am wondering if we can have another preview release for Spark 3.0? This can help us find the design/API defects as early as possible and avoid the significant delay of the upcoming Spark 3.0 release
>
>
> Also, any committer is willing to volunteer as the release manager of the next preview release of Spark 3.0, if we have such a release?
>
>
> Cheers,
>
>
> Xiao

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



--
Databricks Summit - Watch the talks 
Reply | Threaded
Open this post in threaded view
|

Re: Spark 3.0 preview release 2?

Dongjoon Hyun-2
Thank you, All.

+1 for another `3.0-preview`.

Also, thank you Yuming for volunteering for that!

Bests,
Dongjoon.


On Mon, Dec 9, 2019 at 9:39 AM Xiao Li <[hidden email]> wrote:
When entering the official release candidates, the new features have to be disabled or even reverted [if the conf is not available] if the fixes are not trivial; otherwise, we might need 10+ RCs to make the final release. The new features should not block the release based on the previous discussions. 

I agree we should have code freeze at the beginning of 2020. The preview releases should not block the official releases. The preview is just to collect more feedback about these new features or behavior changes.

Also, for the release of Spark 3.0, we still need the Hive community to do us a favor to release 2.3.7 for having HIVE-22190. Before asking Hive community to do 2.3.7 release, if possible, we want our Spark community to have more tries, especially the support of JDK 11 on Hadoop 2.7 and 3.2, which is based on Hive 2.3 execution JAR. During the preview stage, we might find more issues that are not covered by our test cases.

 

On Mon, Dec 9, 2019 at 4:55 AM Sean Owen <[hidden email]> wrote:
Seems fine to me of course. Honestly that wouldn't be a bad result for
a release candidate, though we would probably roll another one now.
How about simply moving to a release candidate? If not now then at
least move to code freeze from the start of 2020. There is also some
downside in pushing out the 3.0 release further with previews.

On Mon, Dec 9, 2019 at 12:32 AM Xiao Li <[hidden email]> wrote:
>
> I got many great feedbacks from the community about the recent 3.0 preview release. Since the last 3.0 preview release, we already have 353 commits [https://github.com/apache/spark/compare/v3.0.0-preview...master]. There are various important features and behavior changes we want the community to try before entering the official release candidates of Spark 3.0.
>
>
> Below is my selected items that are not part of the last 3.0 preview but already available in the upstream master branch:
>
> Support JDK 11 with Hadoop 2.7
> Spark SQL will respect its own default format (i.e., parquet) when users do CREATE TABLE without USING or STORED AS clauses
> Enable Parquet nested schema pruning and nested pruning on expressions by default
> Add observable Metrics for Streaming queries
> Column pruning through nondeterministic expressions
> RecordBinaryComparator should check endianness when compared by long
> Improve parallelism for local shuffle reader in adaptive query execution
> Upgrade Apache Arrow to version 0.15.1
> Various interval-related SQL support
> Add a mode to pin Python thread into JVM's
> Provide option to clean up completed files in streaming query
>
> I am wondering if we can have another preview release for Spark 3.0? This can help us find the design/API defects as early as possible and avoid the significant delay of the upcoming Spark 3.0 release
>
>
> Also, any committer is willing to volunteer as the release manager of the next preview release of Spark 3.0, if we have such a release?
>
>
> Cheers,
>
>
> Xiao

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



--
Databricks Summit - Watch the talks 
Reply | Threaded
Open this post in threaded view
|

Re: Spark 3.0 preview release 2?

Takeshi Yamamuro
+1; Looks great if we can in terms of user's feedbacks.

Bests,
Takeshi

On Tue, Dec 10, 2019 at 3:14 AM Dongjoon Hyun <[hidden email]> wrote:
Thank you, All.

+1 for another `3.0-preview`.

Also, thank you Yuming for volunteering for that!

Bests,
Dongjoon.


On Mon, Dec 9, 2019 at 9:39 AM Xiao Li <[hidden email]> wrote:
When entering the official release candidates, the new features have to be disabled or even reverted [if the conf is not available] if the fixes are not trivial; otherwise, we might need 10+ RCs to make the final release. The new features should not block the release based on the previous discussions. 

I agree we should have code freeze at the beginning of 2020. The preview releases should not block the official releases. The preview is just to collect more feedback about these new features or behavior changes.

Also, for the release of Spark 3.0, we still need the Hive community to do us a favor to release 2.3.7 for having HIVE-22190. Before asking Hive community to do 2.3.7 release, if possible, we want our Spark community to have more tries, especially the support of JDK 11 on Hadoop 2.7 and 3.2, which is based on Hive 2.3 execution JAR. During the preview stage, we might find more issues that are not covered by our test cases.

 

On Mon, Dec 9, 2019 at 4:55 AM Sean Owen <[hidden email]> wrote:
Seems fine to me of course. Honestly that wouldn't be a bad result for
a release candidate, though we would probably roll another one now.
How about simply moving to a release candidate? If not now then at
least move to code freeze from the start of 2020. There is also some
downside in pushing out the 3.0 release further with previews.

On Mon, Dec 9, 2019 at 12:32 AM Xiao Li <[hidden email]> wrote:
>
> I got many great feedbacks from the community about the recent 3.0 preview release. Since the last 3.0 preview release, we already have 353 commits [https://github.com/apache/spark/compare/v3.0.0-preview...master]. There are various important features and behavior changes we want the community to try before entering the official release candidates of Spark 3.0.
>
>
> Below is my selected items that are not part of the last 3.0 preview but already available in the upstream master branch:
>
> Support JDK 11 with Hadoop 2.7
> Spark SQL will respect its own default format (i.e., parquet) when users do CREATE TABLE without USING or STORED AS clauses
> Enable Parquet nested schema pruning and nested pruning on expressions by default
> Add observable Metrics for Streaming queries
> Column pruning through nondeterministic expressions
> RecordBinaryComparator should check endianness when compared by long
> Improve parallelism for local shuffle reader in adaptive query execution
> Upgrade Apache Arrow to version 0.15.1
> Various interval-related SQL support
> Add a mode to pin Python thread into JVM's
> Provide option to clean up completed files in streaming query
>
> I am wondering if we can have another preview release for Spark 3.0? This can help us find the design/API defects as early as possible and avoid the significant delay of the upcoming Spark 3.0 release
>
>
> Also, any committer is willing to volunteer as the release manager of the next preview release of Spark 3.0, if we have such a release?
>
>
> Cheers,
>
>
> Xiao

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



--
Databricks Summit - Watch the talks 


--
---
Takeshi Yamamuro
Reply | Threaded
Open this post in threaded view
|

Re: Spark 3.0 preview release 2?

Matei Zaharia
Administrator
Yup, it would be great to release these more often.

On Dec 9, 2019, at 4:25 PM, Takeshi Yamamuro <[hidden email]> wrote:

+1; Looks great if we can in terms of user's feedbacks.

Bests,
Takeshi

On Tue, Dec 10, 2019 at 3:14 AM Dongjoon Hyun <[hidden email]> wrote:
Thank you, All.

+1 for another `3.0-preview`.

Also, thank you Yuming for volunteering for that!

Bests,
Dongjoon.


On Mon, Dec 9, 2019 at 9:39 AM Xiao Li <[hidden email]> wrote:
When entering the official release candidates, the new features have to be disabled or even reverted [if the conf is not available] if the fixes are not trivial; otherwise, we might need 10+ RCs to make the final release. The new features should not block the release based on the previous discussions. 

I agree we should have code freeze at the beginning of 2020. The preview releases should not block the official releases. The preview is just to collect more feedback about these new features or behavior changes.

Also, for the release of Spark 3.0, we still need the Hive community to do us a favor to release 2.3.7 for having HIVE-22190. Before asking Hive community to do 2.3.7 release, if possible, we want our Spark community to have more tries, especially the support of JDK 11 on Hadoop 2.7 and 3.2, which is based on Hive 2.3 execution JAR. During the preview stage, we might find more issues that are not covered by our test cases.

 

On Mon, Dec 9, 2019 at 4:55 AM Sean Owen <[hidden email]> wrote:
Seems fine to me of course. Honestly that wouldn't be a bad result for
a release candidate, though we would probably roll another one now.
How about simply moving to a release candidate? If not now then at
least move to code freeze from the start of 2020. There is also some
downside in pushing out the 3.0 release further with previews.

On Mon, Dec 9, 2019 at 12:32 AM Xiao Li <[hidden email]> wrote:
>
> I got many great feedbacks from the community about the recent 3.0 preview release. Since the last 3.0 preview release, we already have 353 commits [https://github.com/apache/spark/compare/v3.0.0-preview...master]. There are various important features and behavior changes we want the community to try before entering the official release candidates of Spark 3.0.
>
>
> Below is my selected items that are not part of the last 3.0 preview but already available in the upstream master branch:
>
> Support JDK 11 with Hadoop 2.7
> Spark SQL will respect its own default format (i.e., parquet) when users do CREATE TABLE without USING or STORED AS clauses
> Enable Parquet nested schema pruning and nested pruning on expressions by default
> Add observable Metrics for Streaming queries
> Column pruning through nondeterministic expressions
> RecordBinaryComparator should check endianness when compared by long
> Improve parallelism for local shuffle reader in adaptive query execution
> Upgrade Apache Arrow to version 0.15.1
> Various interval-related SQL support
> Add a mode to pin Python thread into JVM's
> Provide option to clean up completed files in streaming query
>
> I am wondering if we can have another preview release for Spark 3.0? This can help us find the design/API defects as early as possible and avoid the significant delay of the upcoming Spark 3.0 release
>
>
> Also, any committer is willing to volunteer as the release manager of the next preview release of Spark 3.0, if we have such a release?
>
>
> Cheers,
>
>
> Xiao

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



--
Databricks Summit - Watch the talks 


--
---
Takeshi Yamamuro

Reply | Threaded
Open this post in threaded view
|

Re: Spark 3.0 preview release 2?

Tom Graves-2
In reply to this post by Xiao Li
+1 for another preview

Tom

On Monday, December 9, 2019, 12:32:29 AM CST, Xiao Li <[hidden email]> wrote:


I got many great feedbacks from the community about the recent 3.0 preview release. Since the last 3.0 preview release, we already have 353 commits [https://github.com/apache/spark/compare/v3.0.0-preview...master]. There are various important features and behavior changes we want the community to try before entering the official release candidates of Spark 3.0. 


Below is my selected items that are not part of the last 3.0 preview but already available in the upstream master branch: 

  • Support JDK 11 with Hadoop 2.7
  • Spark SQL will respect its own default format (i.e., parquet) when users do CREATE TABLE without USING or STORED AS clauses
  • Enable Parquet nested schema pruning and nested pruning on expressions by default
  • Add observable Metrics for Streaming queries
  • Column pruning through nondeterministic expressions
  • RecordBinaryComparator should check endianness when compared by long 
  • Improve parallelism for local shuffle reader in adaptive query execution
  • Upgrade Apache Arrow to version 0.15.1
  • Various interval-related SQL support
  • Add a mode to pin Python thread into JVM's
  • Provide option to clean up completed files in streaming query

I am wondering if we can have another preview release for Spark 3.0? This can help us find the design/API defects as early as possible and avoid the significant delay of the upcoming Spark 3.0 release


Also, any committer is willing to volunteer as the release manager of the next preview release of Spark 3.0, if we have such a release? 


Cheers,


Xiao

Reply | Threaded
Open this post in threaded view
|

Re: Spark 3.0 preview release 2?

Dongjoon Hyun-2
BTW, our Jenkins seems to be behind.

1. For the first item, `Support JDK 11 with Hadoop 2.7`:
    At least, we need a new Jenkins job `spark-master-test-maven-hadoop-2.7-jdk-11/`.
2. https://issues.apache.org/jira/browse/SPARK-28900 (Test Pyspark, SparkR on JDK 11 with run-tests)
3. https://issues.apache.org/jira/browse/SPARK-29988 (Adjust Jenkins jobs for `hive-1.2/2.3` combination)

It would be great if we can finish the above three jobs before mentioning them in our release note of the next preview.

Bests,
Dongjoon.


On Tue, Dec 10, 2019 at 6:29 AM Tom Graves <[hidden email]> wrote:
+1 for another preview

Tom

On Monday, December 9, 2019, 12:32:29 AM CST, Xiao Li <[hidden email]> wrote:


I got many great feedbacks from the community about the recent 3.0 preview release. Since the last 3.0 preview release, we already have 353 commits [https://github.com/apache/spark/compare/v3.0.0-preview...master]. There are various important features and behavior changes we want the community to try before entering the official release candidates of Spark 3.0. 


Below is my selected items that are not part of the last 3.0 preview but already available in the upstream master branch: 

  • Support JDK 11 with Hadoop 2.7
  • Spark SQL will respect its own default format (i.e., parquet) when users do CREATE TABLE without USING or STORED AS clauses
  • Enable Parquet nested schema pruning and nested pruning on expressions by default
  • Add observable Metrics for Streaming queries
  • Column pruning through nondeterministic expressions
  • RecordBinaryComparator should check endianness when compared by long 
  • Improve parallelism for local shuffle reader in adaptive query execution
  • Upgrade Apache Arrow to version 0.15.1
  • Various interval-related SQL support
  • Add a mode to pin Python thread into JVM's
  • Provide option to clean up completed files in streaming query

I am wondering if we can have another preview release for Spark 3.0? This can help us find the design/API defects as early as possible and avoid the significant delay of the upcoming Spark 3.0 release


Also, any committer is willing to volunteer as the release manager of the next preview release of Spark 3.0, if we have such a release? 


Cheers,


Xiao

Reply | Threaded
Open this post in threaded view
|

Re: Spark 3.0 preview release 2?

Xiao Li-2
Hi, Yuming, 

Thank you, [hidden email] ! It sounds like everyone is fine about releasing a new Spark 3.0 preview. Could you start working on it? 

Thanks,

Xiao 

On Tue, Dec 10, 2019 at 2:14 PM Dongjoon Hyun <[hidden email]> wrote:
BTW, our Jenkins seems to be behind.

1. For the first item, `Support JDK 11 with Hadoop 2.7`:
    At least, we need a new Jenkins job `spark-master-test-maven-hadoop-2.7-jdk-11/`.
2. https://issues.apache.org/jira/browse/SPARK-28900 (Test Pyspark, SparkR on JDK 11 with run-tests)
3. https://issues.apache.org/jira/browse/SPARK-29988 (Adjust Jenkins jobs for `hive-1.2/2.3` combination)

It would be great if we can finish the above three jobs before mentioning them in our release note of the next preview.

Bests,
Dongjoon.


On Tue, Dec 10, 2019 at 6:29 AM Tom Graves <[hidden email]> wrote:
+1 for another preview

Tom

On Monday, December 9, 2019, 12:32:29 AM CST, Xiao Li <[hidden email]> wrote:


I got many great feedbacks from the community about the recent 3.0 preview release. Since the last 3.0 preview release, we already have 353 commits [https://github.com/apache/spark/compare/v3.0.0-preview...master]. There are various important features and behavior changes we want the community to try before entering the official release candidates of Spark 3.0. 


Below is my selected items that are not part of the last 3.0 preview but already available in the upstream master branch: 

  • Support JDK 11 with Hadoop 2.7
  • Spark SQL will respect its own default format (i.e., parquet) when users do CREATE TABLE without USING or STORED AS clauses
  • Enable Parquet nested schema pruning and nested pruning on expressions by default
  • Add observable Metrics for Streaming queries
  • Column pruning through nondeterministic expressions
  • RecordBinaryComparator should check endianness when compared by long 
  • Improve parallelism for local shuffle reader in adaptive query execution
  • Upgrade Apache Arrow to version 0.15.1
  • Various interval-related SQL support
  • Add a mode to pin Python thread into JVM's
  • Provide option to clean up completed files in streaming query

I am wondering if we can have another preview release for Spark 3.0? This can help us find the design/API defects as early as possible and avoid the significant delay of the upcoming Spark 3.0 release


Also, any committer is willing to volunteer as the release manager of the next preview release of Spark 3.0, if we have such a release? 


Cheers,


Xiao



--
Databricks Summit - Watch the talks