Accessing the SQL parser

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Accessing the SQL parser

Abdeali Kothari
I was writing some code to try to auto find a list of tables and databases being used in a SparkSQL query. Mainly I was looking to auto-check the permissions and owners of all the tables a query will be trying to access. 

I was wondering whether PySpark has some method for me to directly use the AST that SparkSQL uses? 

Or is there some documentation on how I can generate and understand the AST in Spark? 

Regards, 
AbdealiJK

Reply | Threaded
Open this post in threaded view
|

Re: Accessing the SQL parser

Michael Shtelma
Hi AbdealiJK,

In order to get AST you can parse your query with Spark Parser :

LogicalPlan logicalPlan =
sparkSession.sessionState().sqlParser().parsePlan("select * from
myTable");

Afterwards you can implement your custom logic and execute it in this way:

Dataset<Row> ds = Dataset.ofRows(sparkSession, logicalPlan);
ds.show();

Alternatively you can manually run resolve and optimize the plan and
maybe do smth else afterwards:

QueryExecution queryExecution =
sparkSession.sessionState().executePlan(logicalPlan);
SparkPlan plan = queryExecution.executedPlan();
RDD<InternalRow> rdd = plan.execute();
System.out.println("rdd.count() = " + rdd.count());

Best,
Michael


On Fri, Jan 12, 2018 at 5:39 AM, Abdeali Kothari
<[hidden email]> wrote:

> I was writing some code to try to auto find a list of tables and databases
> being used in a SparkSQL query. Mainly I was looking to auto-check the
> permissions and owners of all the tables a query will be trying to access.
>
> I was wondering whether PySpark has some method for me to directly use the
> AST that SparkSQL uses?
>
> Or is there some documentation on how I can generate and understand the AST
> in Spark?
>
> Regards,
> AbdealiJK
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]