appendix

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

appendix

sunerhan1992@sina.com
Hello,
My scenary is like this:
        1.val df=hivecontext/carboncontex.sql("sql....")
        2.iterating rows,extrating two columns,id and mvcc, and use id as key to scan hbase to get corresponding value
            if mvcc==value, this row pass,else drop
Is there a better way except dataframe.mapPartitions because it cause an extra stage and spend more time.
I put two DAGs in appendix,please check!

Thanks!!



---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

appendix.zip (396K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: appendix

cloud0fan
you should make hbase a data source(seems we already have hbase connector?), create a dataframe from hbase,  and do join in Spark SQL.

On 21 Jun 2017, at 10:17 AM, [hidden email] wrote:

Hello,
My scenary is like this:
        1.val df=hivecontext/carboncontex.sql("sql....")
        2.iterating rows,extrating two columns,id and mvcc, and use id as key to scan hbase to get corresponding value
            if mvcc==value, this row pass,else drop
Is there a better way except dataframe.mapPartitions because it cause an extra stage and spend more time.
I put two DAGs in appendix,please check!

Thanks!!

<appendix.zip>
---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re: appendix

sunerhan1992@sina.com
In reply to this post by sunerhan1992@sina.com
I tried use other kv database ignite by using IgniteContext to join ,which make things more complicated(multi context) and it's performance is not good.I believe use hbase connector have the same situation.
 

 
Date: 2017-06-21 10:21
Subject: Re: appendix
you should make hbase a data source(seems we already have hbase connector?), create a dataframe from hbase,  and do join in Spark SQL.

On 21 Jun 2017, at 10:17 AM, [hidden email] wrote:

Hello,
My scenary is like this:
        1.val df=hivecontext/carboncontex.sql("sql....")
        2.iterating rows,extrating two columns,id and mvcc, and use id as key to scan hbase to get corresponding value
            if mvcc==value, this row pass,else drop
Is there a better way except dataframe.mapPartitions because it cause an extra stage and spend more time.
I put two DAGs in appendix,please check!

Thanks!!

<appendix.zip>
---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Loading...