HBaseContext with Spark

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

HBaseContext with Spark

Chetan Khatri
Hello Spark Community Folks,

Currently I am using HBase 1.2.4 and Hive 1.2.1, I am looking for Bulk Load from Hbase to Hive.

I have seen couple of good example at HBase Github Repo: https://github.com/apache/hbase/tree/master/hbase-spark

If I would like to use HBaseContext with HBase 1.2.4, how it can be done ? Or which version of HBase has more stability with HBaseContext ?

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: HBaseContext with Spark

Amrit Jangid
Hi chetan,

If you just need HBase Data into Hive, You can use Hive EXTERNAL TABLE with 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'.

Try this if you problem can be solved 

https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration

Regards
Amrit

.

On Wed, Jan 25, 2017 at 5:02 PM, Chetan Khatri <[hidden email]> wrote:
Hello Spark Community Folks,

Currently I am using HBase 1.2.4 and Hive 1.2.1, I am looking for Bulk Load from Hbase to Hive.

I have seen couple of good example at HBase Github Repo: https://github.com/apache/hbase/tree/master/hbase-spark

If I would like to use HBaseContext with HBase 1.2.4, how it can be done ? Or which version of HBase has more stability with HBaseContext ?

Thanks.


 
Reply | Threaded
Open this post in threaded view
|

Re: HBaseContext with Spark

Ted Yu
In reply to this post by Chetan Khatri
Though no hbase release has the hbase-spark module, you can find the backport patch on HBASE-14160 (for Spark 1.6)

You can build the hbase-spark module yourself.

Cheers

On Wed, Jan 25, 2017 at 3:32 AM, Chetan Khatri <[hidden email]> wrote:
Hello Spark Community Folks,

Currently I am using HBase 1.2.4 and Hive 1.2.1, I am looking for Bulk Load from Hbase to Hive.

I have seen couple of good example at HBase Github Repo: https://github.com/apache/hbase/tree/master/hbase-spark

If I would like to use HBaseContext with HBase 1.2.4, how it can be done ? Or which version of HBase has more stability with HBaseContext ?

Thanks.

Reply | Threaded
Open this post in threaded view
|

Re: HBaseContext with Spark

Chetan Khatri
@Ted Yu, Correct but HBase-Spark module available at HBase repository seems too old and written code is not optimized yet, I have been already submitted PR for the same. I dont know if it is clearly mentioned that now it is part of HBase itself then people are committing to older repo where original code is still old. [1]

Other sources has updated info [2]

Ref.

On Wed, Jan 25, 2017 at 8:13 PM, Ted Yu <[hidden email]> wrote:
Though no hbase release has the hbase-spark module, you can find the backport patch on HBASE-14160 (for Spark 1.6)

You can build the hbase-spark module yourself.

Cheers

On Wed, Jan 25, 2017 at 3:32 AM, Chetan Khatri <[hidden email]> wrote:
Hello Spark Community Folks,

Currently I am using HBase 1.2.4 and Hive 1.2.1, I am looking for Bulk Load from Hbase to Hive.

I have seen couple of good example at HBase Github Repo: https://github.com/apache/hbase/tree/master/hbase-spark

If I would like to use HBaseContext with HBase 1.2.4, how it can be done ? Or which version of HBase has more stability with HBaseContext ?

Thanks.


Reply | Threaded
Open this post in threaded view
|

Re: HBaseContext with Spark

Ted Yu
The references are vendor specific.

Suggest contacting vendor's mailing list for your PR.

My initial interpretation of HBase repository is that of Apache.

Cheers

On Wed, Jan 25, 2017 at 7:38 AM, Chetan Khatri <[hidden email]> wrote:
@Ted Yu, Correct but HBase-Spark module available at HBase repository seems too old and written code is not optimized yet, I have been already submitted PR for the same. I dont know if it is clearly mentioned that now it is part of HBase itself then people are committing to older repo where original code is still old. [1]

Other sources has updated info [2]

Ref.

On Wed, Jan 25, 2017 at 8:13 PM, Ted Yu <[hidden email]> wrote:
Though no hbase release has the hbase-spark module, you can find the backport patch on HBASE-14160 (for Spark 1.6)

You can build the hbase-spark module yourself.

Cheers

On Wed, Jan 25, 2017 at 3:32 AM, Chetan Khatri <[hidden email]> wrote:
Hello Spark Community Folks,

Currently I am using HBase 1.2.4 and Hive 1.2.1, I am looking for Bulk Load from Hbase to Hive.

I have seen couple of good example at HBase Github Repo: https://github.com/apache/hbase/tree/master/hbase-spark

If I would like to use HBaseContext with HBase 1.2.4, how it can be done ? Or which version of HBase has more stability with HBaseContext ?

Thanks.



Reply | Threaded
Open this post in threaded view
|

Re: HBaseContext with Spark

Ted Yu
In reply to this post by Amrit Jangid
Does the storage handler provide bulk load capability ?

Cheers

On Jan 25, 2017, at 3:39 AM, Amrit Jangid <[hidden email]> wrote:

Hi chetan,

If you just need HBase Data into Hive, You can use Hive EXTERNAL TABLE with 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'.

Try this if you problem can be solved 

https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration

Regards
Amrit

.

On Wed, Jan 25, 2017 at 5:02 PM, Chetan Khatri <[hidden email]> wrote:
Hello Spark Community Folks,

Currently I am using HBase 1.2.4 and Hive 1.2.1, I am looking for Bulk Load from Hbase to Hive.

I have seen couple of good example at HBase Github Repo: https://github.com/apache/hbase/tree/master/hbase-spark

If I would like to use HBaseContext with HBase 1.2.4, how it can be done ? Or which version of HBase has more stability with HBaseContext ?

Thanks.


 
Reply | Threaded
Open this post in threaded view
|

Re: HBaseContext with Spark

Chetan Khatri
@Ted, I dont think so. 

On Thu, Jan 26, 2017 at 6:35 AM, Ted Yu <[hidden email]> wrote:
Does the storage handler provide bulk load capability ?

Cheers

On Jan 25, 2017, at 3:39 AM, Amrit Jangid <[hidden email]> wrote:

Hi chetan,

If you just need HBase Data into Hive, You can use Hive EXTERNAL TABLE with 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'.

Try this if you problem can be solved 

https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration

Regards
Amrit

.

On Wed, Jan 25, 2017 at 5:02 PM, Chetan Khatri <[hidden email]> wrote:
Hello Spark Community Folks,

Currently I am using HBase 1.2.4 and Hive 1.2.1, I am looking for Bulk Load from Hbase to Hive.

I have seen couple of good example at HBase Github Repo: https://github.com/apache/hbase/tree/master/hbase-spark

If I would like to use HBaseContext with HBase 1.2.4, how it can be done ? Or which version of HBase has more stability with HBaseContext ?

Thanks.


 

Reply | Threaded
Open this post in threaded view
|

Re: HBaseContext with Spark

Chetan Khatri
storage handler bulk load:

SET hive.hbase.bulk=true;
INSERT OVERWRITE TABLE users SELECT … ;
But for now, you have to do some work and issue multiple Hive commands
Sample source data for range partitioning
Save sampling results to a file
Run CLUSTER BY query using HiveHFileOutputFormat and TotalOrderPartitioner (sorts data, producing a large number of region files)
Import HFiles into HBase
HBase can merge files if necessary

On Sat, Jan 28, 2017 at 11:32 AM, Chetan Khatri <[hidden email]> wrote:
@Ted, I dont think so. 

On Thu, Jan 26, 2017 at 6:35 AM, Ted Yu <[hidden email]> wrote:
Does the storage handler provide bulk load capability ?

Cheers

On Jan 25, 2017, at 3:39 AM, Amrit Jangid <[hidden email]> wrote:

Hi chetan,

If you just need HBase Data into Hive, You can use Hive EXTERNAL TABLE with 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'.

Try this if you problem can be solved 

https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration

Regards
Amrit

.

On Wed, Jan 25, 2017 at 5:02 PM, Chetan Khatri <[hidden email]> wrote:
Hello Spark Community Folks,

Currently I am using HBase 1.2.4 and Hive 1.2.1, I am looking for Bulk Load from Hbase to Hive.

I have seen couple of good example at HBase Github Repo: https://github.com/apache/hbase/tree/master/hbase-spark

If I would like to use HBaseContext with HBase 1.2.4, how it can be done ? Or which version of HBase has more stability with HBaseContext ?

Thanks.