[Discuss][SPIP] DataSource V2 SQL push down

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[Discuss][SPIP] DataSource V2 SQL push down

Chang Chen
Hi All

We would like to post s SPIP of Datasource V2 SQL PushDown in Spark.  Here is document link:


This SPIP aims to make pushdown more extendable.

I would like to thank huaxin gao, my prototype is based on her PR. I will submit a PR ASAP

Thanks

Chang.
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss][SPIP] DataSource V2 SQL push down

Chang Chen
The link is broken. I post a PDF version.

Chang Chen <[hidden email]> 于2021年4月2日周五 下午3:57写道:
Hi All

We would like to post s SPIP of Datasource V2 SQL PushDown in Spark.  Here is document link:


This SPIP aims to make pushdown more extendable.

I would like to thank huaxin gao, my prototype is based on her PR. I will submit a PR ASAP

Thanks

Chang.


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Data Source V2 SQL Push Down.pdf (231K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss][SPIP] DataSource V2 SQL push down

huaxin gao

Hello Chang,


Thanks for proposing the SPIP and initiating the discussion. However, I think the problem with your proposal is that you haven’t taken into consideration file-based data sources such as parquet, ORC, etc. As far as I know, most of the Spark users have file-based data sources.  As a matter of fact, I have customers waiting for Aggregate push down for Parquet. That’s the reason I have my current implementation, which has a unified Aggregate push down approach for both the file-based data sources and JDBC. 


I discussed with several members of the Spark community recently, and we have agreed to break down the Aggregate push down work into the following steps:

  1. Implement Max, Min and Count push down in Parquet 

  2. Add a new physical plan rewrite rule to remove partial aggregate. We can optimize one more step to remove ShuffleExchange if the group by column and partition col are the same.

  3. Implement Max, Min and Count push down in JDBC

  4. Implement Sum and Avg push down in JDBC


I plan to implement Aggregate push down for Parquet first for now. The reasons are:

  1. It’s relatively easier to implement Parquet Aggregate push down than JDBC. 

  1. Only need to implement  Max, Min and Count

  2. No need to deal with the differences between Spark and other databases. For example, aggregating decimal values have different behaviours between database implementations.

The main point is that we want to keep the PR minimal and support the basic infrastructure for Aggregate push down first. Actually, the PR for implementing Parquet Aggregate push down is already very big. We don’t want to have a huge PR to solve all the problems. It’s too hard to review.


  1. I think it’s too early to implement the JDBC Aggregate push down for now. Underneath, V2 DS JDBC still calls the V1 DS JDBC path. If we implement JDBC Aggregate push down now, we still need to add a trait PrunedFilteredAggregateScan for V1 JDBC. One of the major motivations that we are having V2 DS is that we want to improve the flexibility of implementing new operator push down by avoiding adding a new push down trait. If we still add a new pushdown trait in V1 DS JDBC, I feel we are defeating the purpose of having DS V2. So I want to wait until we fully migrate to DS V2 JDBC, and then implement Aggregate push down for JDBC.


I have submitted Parquet Aggregate push down PR. Here is the link:


https://github.com/apache/spark/pull/32049


Thanks,

Huaxin



On Fri, Apr 2, 2021 at 1:04 AM Chang Chen <[hidden email]> wrote:
The link is broken. I post a PDF version.

Chang Chen <[hidden email]> 于2021年4月2日周五 下午3:57写道:
Hi All

We would like to post s SPIP of Datasource V2 SQL PushDown in Spark.  Here is document link:


This SPIP aims to make pushdown more extendable.

I would like to thank huaxin gao, my prototype is based on her PR. I will submit a PR ASAP

Thanks

Chang.
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss][SPIP] DataSource V2 SQL push down

Chang Chen
Hi huaxin

What I am concerned about is abstraction
  1. How to extend sources.Aggregation. Because Catalyst Expression is recursion, it is very bad to define a new hierarchy, I think ScanBuilder must convert pushed expressions to its format.
  2. The optimization rule is also an extended point, I didn't see any consideration on join push down. I also think SupportsPushDownRequiredColumns and SupportsPushDownFilters are problematic.  
Obviously, File Based Source and SQL Based Source are quite different on push down capabilities. I am not sure they can be consolidated into one API.

I will push my PR tomorrow, and after that, could we schedule a meeting to discuss the API?

huaxin gao <[hidden email]> 于2021年4月5日周一 上午2:24写道:

Hello Chang,


Thanks for proposing the SPIP and initiating the discussion. However, I think the problem with your proposal is that you haven’t taken into consideration file-based data sources such as parquet, ORC, etc. As far as I know, most of the Spark users have file-based data sources.  As a matter of fact, I have customers waiting for Aggregate push down for Parquet. That’s the reason I have my current implementation, which has a unified Aggregate push down approach for both the file-based data sources and JDBC. 


I discussed with several members of the Spark community recently, and we have agreed to break down the Aggregate push down work into the following steps:

  1. Implement Max, Min and Count push down in Parquet 

  2. Add a new physical plan rewrite rule to remove partial aggregate. We can optimize one more step to remove ShuffleExchange if the group by column and partition col are the same.

  3. Implement Max, Min and Count push down in JDBC

  4. Implement Sum and Avg push down in JDBC


I plan to implement Aggregate push down for Parquet first for now. The reasons are:

  1. It’s relatively easier to implement Parquet Aggregate push down than JDBC. 

  1. Only need to implement  Max, Min and Count

  2. No need to deal with the differences between Spark and other databases. For example, aggregating decimal values have different behaviours between database implementations.

The main point is that we want to keep the PR minimal and support the basic infrastructure for Aggregate push down first. Actually, the PR for implementing Parquet Aggregate push down is already very big. We don’t want to have a huge PR to solve all the problems. It’s too hard to review.


  1. I think it’s too early to implement the JDBC Aggregate push down for now. Underneath, V2 DS JDBC still calls the V1 DS JDBC path. If we implement JDBC Aggregate push down now, we still need to add a trait PrunedFilteredAggregateScan for V1 JDBC. One of the major motivations that we are having V2 DS is that we want to improve the flexibility of implementing new operator push down by avoiding adding a new push down trait. If we still add a new pushdown trait in V1 DS JDBC, I feel we are defeating the purpose of having DS V2. So I want to wait until we fully migrate to DS V2 JDBC, and then implement Aggregate push down for JDBC.


I have submitted Parquet Aggregate push down PR. Here is the link:


https://github.com/apache/spark/pull/32049


Thanks,

Huaxin



On Fri, Apr 2, 2021 at 1:04 AM Chang Chen <[hidden email]> wrote:
The link is broken. I post a PDF version.

Chang Chen <[hidden email]> 于2021年4月2日周五 下午3:57写道:
Hi All

We would like to post s SPIP of Datasource V2 SQL PushDown in Spark.  Here is document link:


This SPIP aims to make pushdown more extendable.

I would like to thank huaxin gao, my prototype is based on her PR. I will submit a PR ASAP

Thanks

Chang.
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss][SPIP] DataSource V2 SQL push down

Chang Chen
hi huaxin


as for add a trait PrunedFilteredAggregateScan for V1 JDBC, I delete trait, since V1 DataSource needn't support aggregation push down

Chang Chen <[hidden email]> 于2021年4月5日周一 下午10:02写道:
Hi huaxin

What I am concerned about is abstraction
  1. How to extend sources.Aggregation. Because Catalyst Expression is recursion, it is very bad to define a new hierarchy, I think ScanBuilder must convert pushed expressions to its format.
  2. The optimization rule is also an extended point, I didn't see any consideration on join push down. I also think SupportsPushDownRequiredColumns and SupportsPushDownFilters are problematic.  
Obviously, File Based Source and SQL Based Source are quite different on push down capabilities. I am not sure they can be consolidated into one API.

I will push my PR tomorrow, and after that, could we schedule a meeting to discuss the API?

huaxin gao <[hidden email]> 于2021年4月5日周一 上午2:24写道:

Hello Chang,


Thanks for proposing the SPIP and initiating the discussion. However, I think the problem with your proposal is that you haven’t taken into consideration file-based data sources such as parquet, ORC, etc. As far as I know, most of the Spark users have file-based data sources.  As a matter of fact, I have customers waiting for Aggregate push down for Parquet. That’s the reason I have my current implementation, which has a unified Aggregate push down approach for both the file-based data sources and JDBC. 


I discussed with several members of the Spark community recently, and we have agreed to break down the Aggregate push down work into the following steps:

  1. Implement Max, Min and Count push down in Parquet 

  2. Add a new physical plan rewrite rule to remove partial aggregate. We can optimize one more step to remove ShuffleExchange if the group by column and partition col are the same.

  3. Implement Max, Min and Count push down in JDBC

  4. Implement Sum and Avg push down in JDBC


I plan to implement Aggregate push down for Parquet first for now. The reasons are:

  1. It’s relatively easier to implement Parquet Aggregate push down than JDBC. 

  1. Only need to implement  Max, Min and Count

  2. No need to deal with the differences between Spark and other databases. For example, aggregating decimal values have different behaviours between database implementations.

The main point is that we want to keep the PR minimal and support the basic infrastructure for Aggregate push down first. Actually, the PR for implementing Parquet Aggregate push down is already very big. We don’t want to have a huge PR to solve all the problems. It’s too hard to review.


  1. I think it’s too early to implement the JDBC Aggregate push down for now. Underneath, V2 DS JDBC still calls the V1 DS JDBC path. If we implement JDBC Aggregate push down now, we still need to add a trait PrunedFilteredAggregateScan for V1 JDBC. One of the major motivations that we are having V2 DS is that we want to improve the flexibility of implementing new operator push down by avoiding adding a new push down trait. If we still add a new pushdown trait in V1 DS JDBC, I feel we are defeating the purpose of having DS V2. So I want to wait until we fully migrate to DS V2 JDBC, and then implement Aggregate push down for JDBC.


I have submitted Parquet Aggregate push down PR. Here is the link:


https://github.com/apache/spark/pull/32049


Thanks,

Huaxin



On Fri, Apr 2, 2021 at 1:04 AM Chang Chen <[hidden email]> wrote:
The link is broken. I post a PDF version.

Chang Chen <[hidden email]> 于2021年4月2日周五 下午3:57写道:
Hi All

We would like to post s SPIP of Datasource V2 SQL PushDown in Spark.  Here is document link:


This SPIP aims to make pushdown more extendable.

I would like to thank huaxin gao, my prototype is based on her PR. I will submit a PR ASAP

Thanks

Chang.
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss][SPIP] DataSource V2 SQL push down

huaxin gao

Hi Chang,


Thanks for working on this. 


Could you please explain how your proposal can be extended to the file-based data sources? Since at least half of the Spark community are using file-based data sources, I think any designs should consider the file-based data sources as well. I work on both sql-based and file-based data sources, and I understand that they are very different. It’s challenging to have a design to work for both, but the current filter push down and column pruning have been designed nicely to fit both sides. I think we should follow the same approach to make Aggregate push down work for both too.


I am currently collaborating with the Apple Spark team and Facebook Spark team to push down Aggregate to file-based data sources. We are doing some ongoing work right now to push down Max/Min/Count to parquet and later to ORC to utilize the statistics information there (https://github.com/apache/spark/pull/32049). Please correct me if I am wrong: it seems to me that your proposal doesn't consider file-based data sources at all and will stop us from continuing our work. 


Let's schedule a meeting to discuss this?


Thanks,

Huaxin




On Wed, Apr 7, 2021 at 1:32 AM Chang Chen <[hidden email]> wrote:
hi huaxin


as for add a trait PrunedFilteredAggregateScan for V1 JDBC, I delete trait, since V1 DataSource needn't support aggregation push down

Chang Chen <[hidden email]> 于2021年4月5日周一 下午10:02写道:
Hi huaxin

What I am concerned about is abstraction
  1. How to extend sources.Aggregation. Because Catalyst Expression is recursion, it is very bad to define a new hierarchy, I think ScanBuilder must convert pushed expressions to its format.
  2. The optimization rule is also an extended point, I didn't see any consideration on join push down. I also think SupportsPushDownRequiredColumns and SupportsPushDownFilters are problematic.  
Obviously, File Based Source and SQL Based Source are quite different on push down capabilities. I am not sure they can be consolidated into one API.

I will push my PR tomorrow, and after that, could we schedule a meeting to discuss the API?

huaxin gao <[hidden email]> 于2021年4月5日周一 上午2:24写道:

Hello Chang,


Thanks for proposing the SPIP and initiating the discussion. However, I think the problem with your proposal is that you haven’t taken into consideration file-based data sources such as parquet, ORC, etc. As far as I know, most of the Spark users have file-based data sources.  As a matter of fact, I have customers waiting for Aggregate push down for Parquet. That’s the reason I have my current implementation, which has a unified Aggregate push down approach for both the file-based data sources and JDBC. 


I discussed with several members of the Spark community recently, and we have agreed to break down the Aggregate push down work into the following steps:

  1. Implement Max, Min and Count push down in Parquet 

  2. Add a new physical plan rewrite rule to remove partial aggregate. We can optimize one more step to remove ShuffleExchange if the group by column and partition col are the same.

  3. Implement Max, Min and Count push down in JDBC

  4. Implement Sum and Avg push down in JDBC


I plan to implement Aggregate push down for Parquet first for now. The reasons are:

  1. It’s relatively easier to implement Parquet Aggregate push down than JDBC. 

  1. Only need to implement  Max, Min and Count

  2. No need to deal with the differences between Spark and other databases. For example, aggregating decimal values have different behaviours between database implementations.

The main point is that we want to keep the PR minimal and support the basic infrastructure for Aggregate push down first. Actually, the PR for implementing Parquet Aggregate push down is already very big. We don’t want to have a huge PR to solve all the problems. It’s too hard to review.


  1. I think it’s too early to implement the JDBC Aggregate push down for now. Underneath, V2 DS JDBC still calls the V1 DS JDBC path. If we implement JDBC Aggregate push down now, we still need to add a trait PrunedFilteredAggregateScan for V1 JDBC. One of the major motivations that we are having V2 DS is that we want to improve the flexibility of implementing new operator push down by avoiding adding a new push down trait. If we still add a new pushdown trait in V1 DS JDBC, I feel we are defeating the purpose of having DS V2. So I want to wait until we fully migrate to DS V2 JDBC, and then implement Aggregate push down for JDBC.


I have submitted Parquet Aggregate push down PR. Here is the link:


https://github.com/apache/spark/pull/32049


Thanks,

Huaxin



On Fri, Apr 2, 2021 at 1:04 AM Chang Chen <[hidden email]> wrote:
The link is broken. I post a PDF version.

Chang Chen <[hidden email]> 于2021年4月2日周五 下午3:57写道:
Hi All

We would like to post s SPIP of Datasource V2 SQL PushDown in Spark.  Here is document link:


This SPIP aims to make pushdown more extendable.

I would like to thank huaxin gao, my prototype is based on her PR. I will submit a PR ASAP

Thanks

Chang.
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss][SPIP] DataSource V2 SQL push down

Chang Chen
hi huaxin

I look into your PR, there would be a way to consolidate the file source and SQL source.

What's the time difference between Beijing and your timezone? I prefer next Monday night or Tuesday morning.

I can share zoom.

huaxin gao <[hidden email]> 于2021年4月8日周四 上午7:10写道:

Hi Chang,


Thanks for working on this. 


Could you please explain how your proposal can be extended to the file-based data sources? Since at least half of the Spark community are using file-based data sources, I think any designs should consider the file-based data sources as well. I work on both sql-based and file-based data sources, and I understand that they are very different. It’s challenging to have a design to work for both, but the current filter push down and column pruning have been designed nicely to fit both sides. I think we should follow the same approach to make Aggregate push down work for both too.


I am currently collaborating with the Apple Spark team and Facebook Spark team to push down Aggregate to file-based data sources. We are doing some ongoing work right now to push down Max/Min/Count to parquet and later to ORC to utilize the statistics information there (https://github.com/apache/spark/pull/32049). Please correct me if I am wrong: it seems to me that your proposal doesn't consider file-based data sources at all and will stop us from continuing our work. 


Let's schedule a meeting to discuss this?


Thanks,

Huaxin




On Wed, Apr 7, 2021 at 1:32 AM Chang Chen <[hidden email]> wrote:
hi huaxin


as for add a trait PrunedFilteredAggregateScan for V1 JDBC, I delete trait, since V1 DataSource needn't support aggregation push down

Chang Chen <[hidden email]> 于2021年4月5日周一 下午10:02写道:
Hi huaxin

What I am concerned about is abstraction
  1. How to extend sources.Aggregation. Because Catalyst Expression is recursion, it is very bad to define a new hierarchy, I think ScanBuilder must convert pushed expressions to its format.
  2. The optimization rule is also an extended point, I didn't see any consideration on join push down. I also think SupportsPushDownRequiredColumns and SupportsPushDownFilters are problematic.  
Obviously, File Based Source and SQL Based Source are quite different on push down capabilities. I am not sure they can be consolidated into one API.

I will push my PR tomorrow, and after that, could we schedule a meeting to discuss the API?

huaxin gao <[hidden email]> 于2021年4月5日周一 上午2:24写道:

Hello Chang,


Thanks for proposing the SPIP and initiating the discussion. However, I think the problem with your proposal is that you haven’t taken into consideration file-based data sources such as parquet, ORC, etc. As far as I know, most of the Spark users have file-based data sources.  As a matter of fact, I have customers waiting for Aggregate push down for Parquet. That’s the reason I have my current implementation, which has a unified Aggregate push down approach for both the file-based data sources and JDBC. 


I discussed with several members of the Spark community recently, and we have agreed to break down the Aggregate push down work into the following steps:

  1. Implement Max, Min and Count push down in Parquet 

  2. Add a new physical plan rewrite rule to remove partial aggregate. We can optimize one more step to remove ShuffleExchange if the group by column and partition col are the same.

  3. Implement Max, Min and Count push down in JDBC

  4. Implement Sum and Avg push down in JDBC


I plan to implement Aggregate push down for Parquet first for now. The reasons are:

  1. It’s relatively easier to implement Parquet Aggregate push down than JDBC. 

  1. Only need to implement  Max, Min and Count

  2. No need to deal with the differences between Spark and other databases. For example, aggregating decimal values have different behaviours between database implementations.

The main point is that we want to keep the PR minimal and support the basic infrastructure for Aggregate push down first. Actually, the PR for implementing Parquet Aggregate push down is already very big. We don’t want to have a huge PR to solve all the problems. It’s too hard to review.


  1. I think it’s too early to implement the JDBC Aggregate push down for now. Underneath, V2 DS JDBC still calls the V1 DS JDBC path. If we implement JDBC Aggregate push down now, we still need to add a trait PrunedFilteredAggregateScan for V1 JDBC. One of the major motivations that we are having V2 DS is that we want to improve the flexibility of implementing new operator push down by avoiding adding a new push down trait. If we still add a new pushdown trait in V1 DS JDBC, I feel we are defeating the purpose of having DS V2. So I want to wait until we fully migrate to DS V2 JDBC, and then implement Aggregate push down for JDBC.


I have submitted Parquet Aggregate push down PR. Here is the link:


https://github.com/apache/spark/pull/32049


Thanks,

Huaxin



On Fri, Apr 2, 2021 at 1:04 AM Chang Chen <[hidden email]> wrote:
The link is broken. I post a PDF version.

Chang Chen <[hidden email]> 于2021年4月2日周五 下午3:57写道:
Hi All

We would like to post s SPIP of Datasource V2 SQL PushDown in Spark.  Here is document link:


This SPIP aims to make pushdown more extendable.

I would like to thank huaxin gao, my prototype is based on her PR. I will submit a PR ASAP

Thanks

Chang.
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss][SPIP] DataSource V2 SQL push down

huaxin gao
I am in PST. Beijing Time is 15 hours ahead of my time. Next Monday night works for me. Let's talk about the details offline? Moving spark dev and others to bcc.

On Thu, Apr 8, 2021 at 11:48 PM Chang Chen <[hidden email]> wrote:
hi huaxin

I look into your PR, there would be a way to consolidate the file source and SQL source.

What's the time difference between Beijing and your timezone? I prefer next Monday night or Tuesday morning.

I can share zoom.

huaxin gao <[hidden email]> 于2021年4月8日周四 上午7:10写道:

Hi Chang,


Thanks for working on this. 


Could you please explain how your proposal can be extended to the file-based data sources? Since at least half of the Spark community are using file-based data sources, I think any designs should consider the file-based data sources as well. I work on both sql-based and file-based data sources, and I understand that they are very different. It’s challenging to have a design to work for both, but the current filter push down and column pruning have been designed nicely to fit both sides. I think we should follow the same approach to make Aggregate push down work for both too.


I am currently collaborating with the Apple Spark team and Facebook Spark team to push down Aggregate to file-based data sources. We are doing some ongoing work right now to push down Max/Min/Count to parquet and later to ORC to utilize the statistics information there (https://github.com/apache/spark/pull/32049). Please correct me if I am wrong: it seems to me that your proposal doesn't consider file-based data sources at all and will stop us from continuing our work. 


Let's schedule a meeting to discuss this?


Thanks,

Huaxin




On Wed, Apr 7, 2021 at 1:32 AM Chang Chen <[hidden email]> wrote:
hi huaxin


as for add a trait PrunedFilteredAggregateScan for V1 JDBC, I delete trait, since V1 DataSource needn't support aggregation push down

Chang Chen <[hidden email]> 于2021年4月5日周一 下午10:02写道:
Hi huaxin

What I am concerned about is abstraction
  1. How to extend sources.Aggregation. Because Catalyst Expression is recursion, it is very bad to define a new hierarchy, I think ScanBuilder must convert pushed expressions to its format.
  2. The optimization rule is also an extended point, I didn't see any consideration on join push down. I also think SupportsPushDownRequiredColumns and SupportsPushDownFilters are problematic.  
Obviously, File Based Source and SQL Based Source are quite different on push down capabilities. I am not sure they can be consolidated into one API.

I will push my PR tomorrow, and after that, could we schedule a meeting to discuss the API?

huaxin gao <[hidden email]> 于2021年4月5日周一 上午2:24写道:

Hello Chang,


Thanks for proposing the SPIP and initiating the discussion. However, I think the problem with your proposal is that you haven’t taken into consideration file-based data sources such as parquet, ORC, etc. As far as I know, most of the Spark users have file-based data sources.  As a matter of fact, I have customers waiting for Aggregate push down for Parquet. That’s the reason I have my current implementation, which has a unified Aggregate push down approach for both the file-based data sources and JDBC. 


I discussed with several members of the Spark community recently, and we have agreed to break down the Aggregate push down work into the following steps:

  1. Implement Max, Min and Count push down in Parquet 

  2. Add a new physical plan rewrite rule to remove partial aggregate. We can optimize one more step to remove ShuffleExchange if the group by column and partition col are the same.

  3. Implement Max, Min and Count push down in JDBC

  4. Implement Sum and Avg push down in JDBC


I plan to implement Aggregate push down for Parquet first for now. The reasons are:

  1. It’s relatively easier to implement Parquet Aggregate push down than JDBC. 

  1. Only need to implement  Max, Min and Count

  2. No need to deal with the differences between Spark and other databases. For example, aggregating decimal values have different behaviours between database implementations.

The main point is that we want to keep the PR minimal and support the basic infrastructure for Aggregate push down first. Actually, the PR for implementing Parquet Aggregate push down is already very big. We don’t want to have a huge PR to solve all the problems. It’s too hard to review.


  1. I think it’s too early to implement the JDBC Aggregate push down for now. Underneath, V2 DS JDBC still calls the V1 DS JDBC path. If we implement JDBC Aggregate push down now, we still need to add a trait PrunedFilteredAggregateScan for V1 JDBC. One of the major motivations that we are having V2 DS is that we want to improve the flexibility of implementing new operator push down by avoiding adding a new push down trait. If we still add a new pushdown trait in V1 DS JDBC, I feel we are defeating the purpose of having DS V2. So I want to wait until we fully migrate to DS V2 JDBC, and then implement Aggregate push down for JDBC.


I have submitted Parquet Aggregate push down PR. Here is the link:


https://github.com/apache/spark/pull/32049


Thanks,

Huaxin



On Fri, Apr 2, 2021 at 1:04 AM Chang Chen <[hidden email]> wrote:
The link is broken. I post a PDF version.

Chang Chen <[hidden email]> 于2021年4月2日周五 下午3:57写道:
Hi All

We would like to post s SPIP of Datasource V2 SQL PushDown in Spark.  Here is document link:


This SPIP aims to make pushdown more extendable.

I would like to thank huaxin gao, my prototype is based on her PR. I will submit a PR ASAP

Thanks

Chang.