Preserving cache name and storage level upon table refresh

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Preserving cache name and storage level upon table refresh

William Wong
Dear Spark developers, 

We noticed that cache name could be changed upon table refreshing. It is because CatalogImpl.refreshTable would first uncached and then recache (lazily) without first preserving cache name (and its storage level). IMHO, it is not what a user would expect. 

I submitted a JIRA (https://issues.apache.org/jira/browse/SPARK-27248) and a PR (https://github.com/apache/spark/pull/24221), so that cache name and storage level would be preserved in table refreshing in the future. 

Appreciate anyone know this part well to review it further. It is my first PR to spark. I hope I make it right. Any comments and suggestions are welcome. [PS: many thanks to Attila Zsolt Piros on those style issues. ]

Thanks and regards,
William
Reply | Threaded
Open this post in threaded view
|

Re: Preserving cache name and storage level upon table refresh

William Wong
Hi Sean and @gatorsmile,

Thanks a lot for your previous review. I updated those test (https://github.com/apache/spark/pull/24221) accordingly. May I know if you can help reviewing them again? 

Best regards,
William



On Wed, Apr 3, 2019 at 1:03 AM William Wong <[hidden email]> wrote:
Dear Spark developers, 

We noticed that cache name could be changed upon table refreshing. It is because CatalogImpl.refreshTable would first uncached and then recache (lazily) without first preserving cache name (and its storage level). IMHO, it is not what a user would expect. 

I submitted a JIRA (https://issues.apache.org/jira/browse/SPARK-27248) and a PR (https://github.com/apache/spark/pull/24221), so that cache name and storage level would be preserved in table refreshing in the future. 

Appreciate anyone know this part well to review it further. It is my first PR to spark. I hope I make it right. Any comments and suggestions are welcome. [PS: many thanks to Attila Zsolt Piros on those style issues. ]

Thanks and regards,
William
Reply | Threaded
Open this post in threaded view
|

Re: Preserving cache name and storage level upon table refresh

William Wong
Hi @gatorsmile, @cloud-fan and Sean, 

Thanks for previous review and suggestions. I updated the test case to cover storage level, fixed the typo on the migration note and applied some other enhancements. Sorry that I accidentally reformat the QueryTest.scala and introduced many unnecessary changes to make the review difficult. It also have been reverted. 

Would you mind to review again and let me know any issues I should continue to work on ? 

Thanks and regards,
William

On Tue, Apr 16, 2019 at 12:21 AM William Wong <[hidden email]> wrote:
Hi Sean and @gatorsmile,

Thanks a lot for your previous review. I updated those test (https://github.com/apache/spark/pull/24221) accordingly. May I know if you can help reviewing them again? 

Best regards,
William



On Wed, Apr 3, 2019 at 1:03 AM William Wong <[hidden email]> wrote:
Dear Spark developers, 

We noticed that cache name could be changed upon table refreshing. It is because CatalogImpl.refreshTable would first uncached and then recache (lazily) without first preserving cache name (and its storage level). IMHO, it is not what a user would expect. 

I submitted a JIRA (https://issues.apache.org/jira/browse/SPARK-27248) and a PR (https://github.com/apache/spark/pull/24221), so that cache name and storage level would be preserved in table refreshing in the future. 

Appreciate anyone know this part well to review it further. It is my first PR to spark. I hope I make it right. Any comments and suggestions are welcome. [PS: many thanks to Attila Zsolt Piros on those style issues. ]

Thanks and regards,
William