Re:[SQL] Syntax "case when" doesn't be supported in JOIN

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re:[SQL] Syntax "case when" doesn't be supported in JOIN

StanZhai
A workaround is diffcult.
You should consider merging this PR <https://github.com/apache/spark/pull/10128> into your Spark.



 
 "wangshuang [via Apache Spark Developers List]"<[hidden email]> wroted at 2017-07-13 18:43:

I'm trying to execute hive sql on spark sql (Also on spark thriftserver), For optimizing data skew, we use "case when" to handle null.
Simple sql as following:


SELECT a.col1
FROM tbl1 a
LEFT OUTER JOIN tbl2 b
ON
        CASE
                WHEN a.col2 IS NULL
                        TNEN cast(rand(9)*1000 - 9999999999 as string)
                ELSE
                        a.col2 END

        = b.col3;


But I get the error:

== Physical Plan ==
org.apache.spark.sql.AnalysisException: nondeterministic expressions are only allowed in
Project, Filter, Aggregate or Window, found:

 (((CASE WHEN (a.`nav_tcdt` IS NULL) THEN CAST(((rand(9) * CAST(1000 AS DOUBLE)) - CAST(9999999999L AS DOUBLE)) AS STRING) ELSE a.`nav_tcdt` END = c.`site_categ_id`) AND (CAST(a.`nav_tcd` AS INT) = 9)) AND (c.`cur_flag` = 1))
in operator Join LeftOuter, (((CASE WHEN isnull(nav_tcdt#25) THEN cast(((rand(9) * cast(1000 as double)) - cast(9999999999 as double)) as string) ELSE nav_tcdt#25 END = site_categ_id#80) && (cast(nav_tcd#26 as int) = 9)) && (cur_flag#77 = 1))
               ;;
GlobalLimit 10
+- LocalLimit 10
   +- Aggregate [date_id#7, CASE WHEN (cast(city_id#10 as string) IN (cast(19596 as string),cast(20134 as string),cast(10997 as string)) && nav_tcdt#25 RLIKE ^[0-9]+$) THEN city_id#10 ELSE nav_tpa_id#21 END], [date_id#7]
      +- Filter (date_id#7 = 2017-07-12)
         +- Join LeftOuter, (((CASE WHEN isnull(nav_tcdt#25) THEN cast(((rand(9) * cast(1000 as double)) - cast(9999999999 as double)) as string) ELSE nav_tcdt#25 END = site_categ_id#80) && (cast(nav_tcd#26 as int) = 9)) && (cur_flag#77 = 1))
            :- SubqueryAlias a
            :  +- SubqueryAlias tmp_lifan_trfc_tpa_hive
            :     +- CatalogRelation `tmp`.`tmp_lifan_trfc_tpa_hive`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [date_id#7, chanl_id#8L, pltfm_id#9, city_id#10, sessn_id#11, gu_id#12, nav_refer_page_type_id#13, nav_refer_page_value#14, nav_refer_tpa#15, nav_refer_tpa_id#16, nav_refer_tpc#17, nav_refer_tpi#18, nav_page_type_id#19, nav_page_value#20, nav_tpa_id#21, nav_tpa#22, nav_tpc#23, nav_tpi#24, nav_tcdt#25, nav_tcd#26, nav_tci#27, nav_tce#28, detl_refer_page_type_id#29, detl_refer_page_value#30, ... 33 more fields]
            +- SubqueryAlias c
               +- SubqueryAlias dim_site_categ_ext
                  +- CatalogRelation `dw`.`dim_site_categ_ext`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [site_categ_skid#64L, site_categ_type#65, site_categ_code#66, site_categ_name#67, site_categ_parnt_skid#68L, site_categ_kywrd#69, leaf_flg#70L, sort_seq#71L, site_categ_srch_name#72, vsbl_flg#73, delet_flag#74, etl_batch_id#75L, updt_time#76, cur_flag#77, bkgrnd_categ_skid#78L, bkgrnd_categ_id#79L, site_categ_id#80, site_categ_parnt_id#81]

Does spark sql not support syntax "case when" in JOIN?  Additional, my spark version is 2.2.0.
Any help would be greatly appreciated.




If you reply to this email, your message will be added to the discussion below:
http://apache-spark-developers-list.1001551.n3.nabble.com/SQL-Syntax-case-when-doesn-t-be-supported-in-JOIN-tp21953.html
To start a new topic under Apache Spark Developers List, email [hidden email]
To unsubscribe from Apache Spark Developers List, click here.
NAML

Loading...