classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

My scenary is like this:
        1.val df=hivecontext/carboncontex.sql("sql....")
        2.iterating rows,extrating two columns,id and mvcc, and use id as key to scan hbase to get corresponding value
            if mvcc==value, this row pass,else drop
Is there a better way except dataframe.mapPartitions because it cause an extra stage and spend more time.
I put two DAGs in appendix,please check!


To unsubscribe e-mail: [hidden email] (396K) Download Attachment