[DISCUSS] Consistent relation resolution behavior in SparkSQL

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Consistent relation resolution behavior in SparkSQL

Terry Kim
Hi all,

As discussed in SPARK-29900, Spark currently has two different relation resolution behaviors:
  1. Look up temp view first, then table/persistent view
  2. Look up table/persistent view
The first behavior is used in SELECT, INSERT and a few commands that support temp views such as DESCRIBE TABLE, etc. The second behavior is used in most commands. Thus, it is hard to predict which relation resolution rule is being applied for a given command.

I want to propose a consistent relation resolution behavior in which temp views are always looked up first before table/persistent view, as described more in detail in this doc: consistent relation resolution proposal.

Note that this proposal is a breaking change, but the impact should be minimal since this applies only when there are temp views and tables with the same name.

Any feedback will be appreciated.

I also want to thank Wenchen Fan, Ryan Blue, Burak Yavuz, and Dongjoon Hyun for guidance and suggestion.

Regards,
Terry



Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Consistent relation resolution behavior in SparkSQL

cloud0fan
+1, I think it's good for both end-users and Spark developers:
* for end-users, when they lookup a table, they don't need to care which command triggers it, as the behavior is consistent in all the places.
* for Spark developers, we may simplify the code quite a bit. For now we have two code paths to lookup tables: one for SELECT/INSERT and one for other commands.

Thanks,
Wenchen

On Mon, Dec 2, 2019 at 9:12 AM Terry Kim <[hidden email]> wrote:
Hi all,

As discussed in SPARK-29900, Spark currently has two different relation resolution behaviors:
  1. Look up temp view first, then table/persistent view
  2. Look up table/persistent view
The first behavior is used in SELECT, INSERT and a few commands that support temp views such as DESCRIBE TABLE, etc. The second behavior is used in most commands. Thus, it is hard to predict which relation resolution rule is being applied for a given command.

I want to propose a consistent relation resolution behavior in which temp views are always looked up first before table/persistent view, as described more in detail in this doc: consistent relation resolution proposal.

Note that this proposal is a breaking change, but the impact should be minimal since this applies only when there are temp views and tables with the same name.

Any feedback will be appreciated.

I also want to thank Wenchen Fan, Ryan Blue, Burak Yavuz, and Dongjoon Hyun for guidance and suggestion.

Regards,
Terry



Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Consistent relation resolution behavior in SparkSQL

Ryan Blue
+1 for the proposal. The current behavior is confusing.

We also came up with another case that we should consider while implementing a ViewCatalog: an unresolved relation in a permanent view (from a view catalog) should never resolve a temporary table. If I have a view `pview` defined as `select * from t1` with database `db`, then `t1` should always resolve to `db.t1` and never a temp view `t1`. If it resolves to the temp view, then temp views can unexpectedly change the behavior of stored views.

On Wed, Dec 4, 2019 at 7:02 PM Wenchen Fan <[hidden email]> wrote:
+1, I think it's good for both end-users and Spark developers:
* for end-users, when they lookup a table, they don't need to care which command triggers it, as the behavior is consistent in all the places.
* for Spark developers, we may simplify the code quite a bit. For now we have two code paths to lookup tables: one for SELECT/INSERT and one for other commands.

Thanks,
Wenchen

On Mon, Dec 2, 2019 at 9:12 AM Terry Kim <[hidden email]> wrote:
Hi all,

As discussed in SPARK-29900, Spark currently has two different relation resolution behaviors:
  1. Look up temp view first, then table/persistent view
  2. Look up table/persistent view
The first behavior is used in SELECT, INSERT and a few commands that support temp views such as DESCRIBE TABLE, etc. The second behavior is used in most commands. Thus, it is hard to predict which relation resolution rule is being applied for a given command.

I want to propose a consistent relation resolution behavior in which temp views are always looked up first before table/persistent view, as described more in detail in this doc: consistent relation resolution proposal.

Note that this proposal is a breaking change, but the impact should be minimal since this applies only when there are temp views and tables with the same name.

Any feedback will be appreciated.

I also want to thank Wenchen Fan, Ryan Blue, Burak Yavuz, and Dongjoon Hyun for guidance and suggestion.

Regards,
Terry





--
Ryan Blue
Software Engineer
Netflix