[Datasource V2] Exception Handling for Catalogs - Naming Suggestions

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[Datasource V2] Exception Handling for Catalogs - Naming Suggestions

RussS
Currently the way some actions work, we receive an error during analysis phase. For example, doing a "SELECT * FROM non_existent_table" will return an analysis exception as the NoSuchTableException is captured and replaced.

Other actions like the "ShowNamespaceExec" call catalog methods directly and directly throw a NoSuchNamsSpaceException

While I don't think the difference here is a big deal, It would be nice if we could have just one set of behaviors. My interest here was being able to throw custom NoSuchTable and NoSuchNameSpace exceptions
which contained naming suggestions.

For example

SELECT * from  catalog.keyspace.tuble

Could optionally return an analysis exception which suggested other names of tables (or keyspaces) which contained like

"Could not find keyspace.tuble, found a near hit: keyspace.table"

While I have the code to do this internally within my own catalog because of previous functionality,
 I cannot surface this information to the user anymore in DSV2 because of the way the exceptions are handled. 


Given this, I'm wondering if we could wrap and rethrow the NoSuchExceptions or would it be better for catalogs to support an interfaces like


trait SupportsTableNamingSuggestions extends TableCatalog {
  val matchThreshold = 0.7
  val matcher: (String, String) => Double = JaroWinkler // Or some other string comparison algorithm

  /** Given an identifier, return a list of identifiers that it may have been mistaken for"
  default def tableSuggestions(missingIdent: Identifier): Seq[Ident] {
    for (table <- getTables(ident.namespace) if matcher(missingIdent.table, ident.table) > matchThreshold) yield {
         table
    }
  }
}

// And another for namespaces

Then on analysis failures, if the catalog supports this trait we could supplement the analysis failure with some helpful info. We could leave this as a trait for
those implementations whose listing of tables or namespaces is costly.


Thanks for your consideration, this is the kind of feature that I think is very useful to end users and we can add with pretty limited cost,
Russ
Reply | Threaded
Open this post in threaded view
|

Re: [Datasource V2] Exception Handling for Catalogs - Naming Suggestions

cloud0fan
This looks a bit specific and maybe it's better to allow catalogs to customize the error message, which is more general.

On Wed, May 13, 2020 at 12:16 AM Russell Spitzer <[hidden email]> wrote:
Currently the way some actions work, we receive an error during analysis phase. For example, doing a "SELECT * FROM non_existent_table" will return an analysis exception as the NoSuchTableException is captured and replaced.

Other actions like the "ShowNamespaceExec" call catalog methods directly and directly throw a NoSuchNamsSpaceException

While I don't think the difference here is a big deal, It would be nice if we could have just one set of behaviors. My interest here was being able to throw custom NoSuchTable and NoSuchNameSpace exceptions
which contained naming suggestions.

For example

SELECT * from  catalog.keyspace.tuble

Could optionally return an analysis exception which suggested other names of tables (or keyspaces) which contained like

"Could not find keyspace.tuble, found a near hit: keyspace.table"

While I have the code to do this internally within my own catalog because of previous functionality,
 I cannot surface this information to the user anymore in DSV2 because of the way the exceptions are handled. 


Given this, I'm wondering if we could wrap and rethrow the NoSuchExceptions or would it be better for catalogs to support an interfaces like


trait SupportsTableNamingSuggestions extends TableCatalog {
  val matchThreshold = 0.7
  val matcher: (String, String) => Double = JaroWinkler // Or some other string comparison algorithm

  /** Given an identifier, return a list of identifiers that it may have been mistaken for"
  default def tableSuggestions(missingIdent: Identifier): Seq[Ident] {
    for (table <- getTables(ident.namespace) if matcher(missingIdent.table, ident.table) > matchThreshold) yield {
         table
    }
  }
}

// And another for namespaces

Then on analysis failures, if the catalog supports this trait we could supplement the analysis failure with some helpful info. We could leave this as a trait for
those implementations whose listing of tables or namespaces is costly.


Thanks for your consideration, this is the kind of feature that I think is very useful to end users and we can add with pretty limited cost,
Russ