Quantcast

MatchPath UDF in Spark SQL

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

MatchPath UDF in Spark SQL

todor
This post has NOT been accepted by the mailing list yet.
Hi,

Has anyone tried to integrate or use the Hive MatchPath UDF in Spark SQL?
The Hive implementation is here: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/MatchPath.java 
Example use in Hive: https://github.com/koeninger/spark-1/blob/master/sql/hive/src/test/resources/ql/src/test/queries/clientpositive/ptf_matchpath.q 
It works perfect in Hive, but it is not the case in Spark and I try to find any way to use it in Spark or SparkSql.

In Spark it is officially not supported. I tried to add it as an external UDF in spark-sql (https://forums.databricks.com/questions/9107/do-i-need-to-do-anything-more-to-use-hive-udf-in-s.html) :
 
   add jar /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hive/lib/hive-exec.jar;

    CREATE TEMPORARY FUNCTION matchpath AS 'org.apache.hadoop.hive.ql.udf.ptf.MatchPath';

   SELECT  sid, uid  FROM matchpath (  on TEMP_TABLE1 ...

it recognizes the function but I get syntax error after matchpath:  
Error in query:
mismatched input '(' expecting {<EOF>, ',', 'SELECT', 'FROM', 'ADD',

So I am not sure if I am doing something wrong or it is just very complex function and Spark is really not able to support it as custom UDF.
In the Spark code(https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala)  I found a place stating
"// List of functions we are explicitly not supporting are:
// compute_stats, context_ngrams, create_union,
// current_user, ewah_bitmap, ewah_bitmap_and, ewah_bitmap_empty, ewah_bitmap_or, field,
// in_file, index, matchpath ..."
But it is not clear why it is not supported.
Any help or advice is appreciated! Thanks!

Best regards,
Todor
Loading...