Adding Extension to Load Custom functions into Thriftserver/SqlShell

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Adding Extension to Load Custom functions into Thriftserver/SqlShell

RussS
I've been looking recently on possible avenues to load new functions into the Thriftserver and SqlShell at launch time. I basically want to preload a set of functions in addition to those already present in the Spark Code. I'm not sure there is at present a way to do this and I was wondering if anyone had any ideas.

I would basically want to make it so that any user launching either of these tools would automatically have access to some custom functions. In the SparkShell I can do this by adding additional lines to the init section but I think It would be nice if we could pass in a parameter which would point to a class with a list of additional functions to add to all new session states.

An interface like Spark Sessions Extensions but instead of running during Session Init, it would run after session init has completed. 

Thanks for your time and I would be glad to hear any opinions or ideas on this,
Reply | Threaded
Open this post in threaded view
|

Re: Adding Extension to Load Custom functions into Thriftserver/SqlShell

Mark Hamstra
You're talking about users starting Thriftserver or SqlShell from the command line, right? It's much easier if you are starting a Thriftserver programmatically so that you can register functions when initializing a SparkContext and then  HiveThriftServer2.startWithContext using that context.

On Wed, Sep 26, 2018 at 3:30 PM Russell Spitzer <[hidden email]> wrote:
I've been looking recently on possible avenues to load new functions into the Thriftserver and SqlShell at launch time. I basically want to preload a set of functions in addition to those already present in the Spark Code. I'm not sure there is at present a way to do this and I was wondering if anyone had any ideas.

I would basically want to make it so that any user launching either of these tools would automatically have access to some custom functions. In the SparkShell I can do this by adding additional lines to the init section but I think It would be nice if we could pass in a parameter which would point to a class with a list of additional functions to add to all new session states.

An interface like Spark Sessions Extensions but instead of running during Session Init, it would run after session init has completed. 

Thanks for your time and I would be glad to hear any opinions or ideas on this,
Reply | Threaded
Open this post in threaded view
|

Re: Adding Extension to Load Custom functions into Thriftserver/SqlShell

RussS
While that's easy for some users, we basically want to load up some functions by default into all session catalogues regardless of who made them. We do this with certain rules and strategies using the SparkExtensions, so all apps that run through our submit scripts get a config parameter added and it's transparent to the user. I think we'll probably have to do some forks (at least for the CliDriver), the thriftserver has a bunch of code which doesn't run under "startWithContext" so we may have an issue there as well.


On Wed, Sep 26, 2018, 6:21 PM Mark Hamstra <[hidden email]> wrote:
You're talking about users starting Thriftserver or SqlShell from the command line, right? It's much easier if you are starting a Thriftserver programmatically so that you can register functions when initializing a SparkContext and then  HiveThriftServer2.startWithContext using that context.

On Wed, Sep 26, 2018 at 3:30 PM Russell Spitzer <[hidden email]> wrote:
I've been looking recently on possible avenues to load new functions into the Thriftserver and SqlShell at launch time. I basically want to preload a set of functions in addition to those already present in the Spark Code. I'm not sure there is at present a way to do this and I was wondering if anyone had any ideas.

I would basically want to make it so that any user launching either of these tools would automatically have access to some custom functions. In the SparkShell I can do this by adding additional lines to the init section but I think It would be nice if we could pass in a parameter which would point to a class with a list of additional functions to add to all new session states.

An interface like Spark Sessions Extensions but instead of running during Session Init, it would run after session init has completed. 

Thanks for your time and I would be glad to hear any opinions or ideas on this,
Reply | Threaded
Open this post in threaded view
|

Re: Adding Extension to Load Custom functions into Thriftserver/SqlShell

rxin
Thoughts on how the api would look like?

On Thu, Sep 27, 2018 at 11:13 AM Russell Spitzer <[hidden email]> wrote:
While that's easy for some users, we basically want to load up some functions by default into all session catalogues regardless of who made them. We do this with certain rules and strategies using the SparkExtensions, so all apps that run through our submit scripts get a config parameter added and it's transparent to the user. I think we'll probably have to do some forks (at least for the CliDriver), the thriftserver has a bunch of code which doesn't run under "startWithContext" so we may have an issue there as well.



On Wed, Sep 26, 2018, 6:21 PM Mark Hamstra <[hidden email]> wrote:
You're talking about users starting Thriftserver or SqlShell from the command line, right? It's much easier if you are starting a Thriftserver programmatically so that you can register functions when initializing a SparkContext and then  HiveThriftServer2.startWithContext using that context.

On Wed, Sep 26, 2018 at 3:30 PM Russell Spitzer <[hidden email]> wrote:
I've been looking recently on possible avenues to load new functions into the Thriftserver and SqlShell at launch time. I basically want to preload a set of functions in addition to those already present in the Spark Code. I'm not sure there is at present a way to do this and I was wondering if anyone had any ideas.

I would basically want to make it so that any user launching either of these tools would automatically have access to some custom functions. In the SparkShell I can do this by adding additional lines to the init section but I think It would be nice if we could pass in a parameter which would point to a class with a list of additional functions to add to all new session states.

An interface like Spark Sessions Extensions but instead of running during Session Init, it would run after session init has completed. 

Thanks for your time and I would be glad to hear any opinions or ideas on this,
--
--
excuse the brevity and lower case due to wrist injury
Reply | Threaded
Open this post in threaded view
|

Re: Adding Extension to Load Custom functions into Thriftserver/SqlShell

RussS
It would be a @dev internal api I think

If we wanted to go extremely general with post session init, it could be added to SparkExtensions

def postSessionInit(session: SparkSession) : Unit 

Which would allow you to do just about anything after sessionState was done initialized.

Or if we specifically wanted to allow just functions

def injectFunction(name: String, function: Seq[Expression] => [Expression]) {
  sparkSession.registerFunction(name, function) // Or add to a buffer which is registered later
}



On Thu, Sep 27, 2018 at 1:16 PM Reynold Xin <[hidden email]> wrote:
Thoughts on how the api would look like?

On Thu, Sep 27, 2018 at 11:13 AM Russell Spitzer <[hidden email]> wrote:
While that's easy for some users, we basically want to load up some functions by default into all session catalogues regardless of who made them. We do this with certain rules and strategies using the SparkExtensions, so all apps that run through our submit scripts get a config parameter added and it's transparent to the user. I think we'll probably have to do some forks (at least for the CliDriver), the thriftserver has a bunch of code which doesn't run under "startWithContext" so we may have an issue there as well.



On Wed, Sep 26, 2018, 6:21 PM Mark Hamstra <[hidden email]> wrote:
You're talking about users starting Thriftserver or SqlShell from the command line, right? It's much easier if you are starting a Thriftserver programmatically so that you can register functions when initializing a SparkContext and then  HiveThriftServer2.startWithContext using that context.

On Wed, Sep 26, 2018 at 3:30 PM Russell Spitzer <[hidden email]> wrote:
I've been looking recently on possible avenues to load new functions into the Thriftserver and SqlShell at launch time. I basically want to preload a set of functions in addition to those already present in the Spark Code. I'm not sure there is at present a way to do this and I was wondering if anyone had any ideas.

I would basically want to make it so that any user launching either of these tools would automatically have access to some custom functions. In the SparkShell I can do this by adding additional lines to the init section but I think It would be nice if we could pass in a parameter which would point to a class with a list of additional functions to add to all new session states.

An interface like Spark Sessions Extensions but instead of running during Session Init, it would run after session init has completed. 

Thanks for your time and I would be glad to hear any opinions or ideas on this,
--
--
excuse the brevity and lower case due to wrist injury
Reply | Threaded
Open this post in threaded view
|

Re: Adding Extension to Load Custom functions into Thriftserver/SqlShell

RussS
And incase anyone is wondering, the reason I want this may be avoided with DataSourceV2 depending on some of the function pushdown discussions. We want to add functions which work only with the Cassandra DataSource (ttl and writetime), I've done the work to add in the custom expressions and analysis rules, but I want to make sure it gets into the SQL interface. 

On Thu, Sep 27, 2018 at 1:35 PM Russell Spitzer <[hidden email]> wrote:
It would be a @dev internal api I think

If we wanted to go extremely general with post session init, it could be added to SparkExtensions

def postSessionInit(session: SparkSession) : Unit 

Which would allow you to do just about anything after sessionState was done initialized.

Or if we specifically wanted to allow just functions

def injectFunction(name: String, function: Seq[Expression] => [Expression]) {
  sparkSession.registerFunction(name, function) // Or add to a buffer which is registered later
}



On Thu, Sep 27, 2018 at 1:16 PM Reynold Xin <[hidden email]> wrote:
Thoughts on how the api would look like?

On Thu, Sep 27, 2018 at 11:13 AM Russell Spitzer <[hidden email]> wrote:
While that's easy for some users, we basically want to load up some functions by default into all session catalogues regardless of who made them. We do this with certain rules and strategies using the SparkExtensions, so all apps that run through our submit scripts get a config parameter added and it's transparent to the user. I think we'll probably have to do some forks (at least for the CliDriver), the thriftserver has a bunch of code which doesn't run under "startWithContext" so we may have an issue there as well.



On Wed, Sep 26, 2018, 6:21 PM Mark Hamstra <[hidden email]> wrote:
You're talking about users starting Thriftserver or SqlShell from the command line, right? It's much easier if you are starting a Thriftserver programmatically so that you can register functions when initializing a SparkContext and then  HiveThriftServer2.startWithContext using that context.

On Wed, Sep 26, 2018 at 3:30 PM Russell Spitzer <[hidden email]> wrote:
I've been looking recently on possible avenues to load new functions into the Thriftserver and SqlShell at launch time. I basically want to preload a set of functions in addition to those already present in the Spark Code. I'm not sure there is at present a way to do this and I was wondering if anyone had any ideas.

I would basically want to make it so that any user launching either of these tools would automatically have access to some custom functions. In the SparkShell I can do this by adding additional lines to the init section but I think It would be nice if we could pass in a parameter which would point to a class with a list of additional functions to add to all new session states.

An interface like Spark Sessions Extensions but instead of running during Session Init, it would run after session init has completed. 

Thanks for your time and I would be glad to hear any opinions or ideas on this,
--
--
excuse the brevity and lower case due to wrist injury
Reply | Threaded
Open this post in threaded view
|

Re: Adding Extension to Load Custom functions into Thriftserver/SqlShell

RussS
I wrote a quick patch and attached it if anyone wants to think about this in context. I can always rebase this to master.

On Thu, Sep 27, 2018 at 1:39 PM Russell Spitzer <[hidden email]> wrote:
And incase anyone is wondering, the reason I want this may be avoided with DataSourceV2 depending on some of the function pushdown discussions. We want to add functions which work only with the Cassandra DataSource (ttl and writetime), I've done the work to add in the custom expressions and analysis rules, but I want to make sure it gets into the SQL interface. 

On Thu, Sep 27, 2018 at 1:35 PM Russell Spitzer <[hidden email]> wrote:
It would be a @dev internal api I think

If we wanted to go extremely general with post session init, it could be added to SparkExtensions

def postSessionInit(session: SparkSession) : Unit 

Which would allow you to do just about anything after sessionState was done initialized.

Or if we specifically wanted to allow just functions

def injectFunction(name: String, function: Seq[Expression] => [Expression]) {
  sparkSession.registerFunction(name, function) // Or add to a buffer which is registered later
}



On Thu, Sep 27, 2018 at 1:16 PM Reynold Xin <[hidden email]> wrote:
Thoughts on how the api would look like?

On Thu, Sep 27, 2018 at 11:13 AM Russell Spitzer <[hidden email]> wrote:
While that's easy for some users, we basically want to load up some functions by default into all session catalogues regardless of who made them. We do this with certain rules and strategies using the SparkExtensions, so all apps that run through our submit scripts get a config parameter added and it's transparent to the user. I think we'll probably have to do some forks (at least for the CliDriver), the thriftserver has a bunch of code which doesn't run under "startWithContext" so we may have an issue there as well.



On Wed, Sep 26, 2018, 6:21 PM Mark Hamstra <[hidden email]> wrote:
You're talking about users starting Thriftserver or SqlShell from the command line, right? It's much easier if you are starting a Thriftserver programmatically so that you can register functions when initializing a SparkContext and then  HiveThriftServer2.startWithContext using that context.

On Wed, Sep 26, 2018 at 3:30 PM Russell Spitzer <[hidden email]> wrote:
I've been looking recently on possible avenues to load new functions into the Thriftserver and SqlShell at launch time. I basically want to preload a set of functions in addition to those already present in the Spark Code. I'm not sure there is at present a way to do this and I was wondering if anyone had any ideas.

I would basically want to make it so that any user launching either of these tools would automatically have access to some custom functions. In the SparkShell I can do this by adding additional lines to the init section but I think It would be nice if we could pass in a parameter which would point to a class with a list of additional functions to add to all new session states.

An interface like Spark Sessions Extensions but instead of running during Session Init, it would run after session init has completed. 

Thanks for your time and I would be glad to hear any opinions or ideas on this,
--
--
excuse the brevity and lower case due to wrist injury


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

InjectFunctions.patch (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Adding Extension to Load Custom functions into Thriftserver/SqlShell

Mark Hamstra
In reply to this post by RussS
Yes, the "startWithContext" code predates SparkSessions in Thriftserver, so it doesn't really work the way you want it to with Session initiation.

On Thu, Sep 27, 2018 at 11:13 AM Russell Spitzer <[hidden email]> wrote:
While that's easy for some users, we basically want to load up some functions by default into all session catalogues regardless of who made them. We do this with certain rules and strategies using the SparkExtensions, so all apps that run through our submit scripts get a config parameter added and it's transparent to the user. I think we'll probably have to do some forks (at least for the CliDriver), the thriftserver has a bunch of code which doesn't run under "startWithContext" so we may have an issue there as well.


On Wed, Sep 26, 2018, 6:21 PM Mark Hamstra <[hidden email]> wrote:
You're talking about users starting Thriftserver or SqlShell from the command line, right? It's much easier if you are starting a Thriftserver programmatically so that you can register functions when initializing a SparkContext and then  HiveThriftServer2.startWithContext using that context.

On Wed, Sep 26, 2018 at 3:30 PM Russell Spitzer <[hidden email]> wrote:
I've been looking recently on possible avenues to load new functions into the Thriftserver and SqlShell at launch time. I basically want to preload a set of functions in addition to those already present in the Spark Code. I'm not sure there is at present a way to do this and I was wondering if anyone had any ideas.

I would basically want to make it so that any user launching either of these tools would automatically have access to some custom functions. In the SparkShell I can do this by adding additional lines to the init section but I think It would be nice if we could pass in a parameter which would point to a class with a list of additional functions to add to all new session states.

An interface like Spark Sessions Extensions but instead of running during Session Init, it would run after session init has completed. 

Thanks for your time and I would be glad to hear any opinions or ideas on this,
Reply | Threaded
Open this post in threaded view
|

Re: Adding Extension to Load Custom functions into Thriftserver/SqlShell

RussS
Yeah I had no specific reason, BaseSessionStateBuilder is probably better. I'll Jira it up

On Thu, Sep 27, 2018 at 4:47 PM Herman van Hovell <[hidden email]> wrote:
Hey Russel,

I took a quick look at your path. I think it is more inline with they way the current extensions work, if you call the extensions in the BaseSessionStateBuilder instead of in the SessionState. Is there any reason why you want to do it like that?

Anyway, high level this makes sense to me, maybe we should move it into a PR.

Cheers,
Herman


On Thu, Sep 27, 2018 at 11:32 PM Mark Hamstra <[hidden email]> wrote:
Yes, the "startWithContext" code predates SparkSessions in Thriftserver, so it doesn't really work the way you want it to with Session initiation.

On Thu, Sep 27, 2018 at 11:13 AM Russell Spitzer <[hidden email]> wrote:
While that's easy for some users, we basically want to load up some functions by default into all session catalogues regardless of who made them. We do this with certain rules and strategies using the SparkExtensions, so all apps that run through our submit scripts get a config parameter added and it's transparent to the user. I think we'll probably have to do some forks (at least for the CliDriver), the thriftserver has a bunch of code which doesn't run under "startWithContext" so we may have an issue there as well.


On Wed, Sep 26, 2018, 6:21 PM Mark Hamstra <[hidden email]> wrote:
You're talking about users starting Thriftserver or SqlShell from the command line, right? It's much easier if you are starting a Thriftserver programmatically so that you can register functions when initializing a SparkContext and then  HiveThriftServer2.startWithContext using that context.

On Wed, Sep 26, 2018 at 3:30 PM Russell Spitzer <[hidden email]> wrote:
I've been looking recently on possible avenues to load new functions into the Thriftserver and SqlShell at launch time. I basically want to preload a set of functions in addition to those already present in the Spark Code. I'm not sure there is at present a way to do this and I was wondering if anyone had any ideas.

I would basically want to make it so that any user launching either of these tools would automatically have access to some custom functions. In the SparkShell I can do this by adding additional lines to the init section but I think It would be nice if we could pass in a parameter which would point to a class with a list of additional functions to add to all new session states.

An interface like Spark Sessions Extensions but instead of running during Session Init, it would run after session init has completed. 

Thanks for your time and I would be glad to hear any opinions or ideas on this,