hbase建表时region预分区的方法_lintj2006

http://blog.sina.com.cn/u/2072506277

首页博文目录关于我

个人资料

微博

加好友发纸条

写留言加关注

博客等级：
博客积分：

博客访问：
关注人气：
获赠金笔：0支
赠出金笔：0支
荣誉徽章：

正文字体大小：大中小

hbase建表时region预分区的方法

(2016-07-14 16:27:31)

标签：

hbase

分类： hadoop

如果知道hbase数据表的key的分布情况，就可以在建表的时候对hbase进行region的预分区。这样做的好处是防止大数据量插入的热点问题，提高数据插入的效率。

步骤：

1.规划hbase预分区

首先就是要想明白数据的key是如何分布的，然后规划一下要分成多少region，每个region的startkey和endkey是多少，然后将规划的key写到一个文件中。比如，key的前几位字符串都是从0001~0010的数字，这样可以分成10个region，划分key的文件如下：

[plain]view plain copy

  

  

  

  

  

  

  

  



为什么后面会跟着一个"|"，是因为在ASCII码中，"|"的值是124，大于所有的数字和字母等符号，当然也可以用“~”（ASCII-126）。分隔文件的第一行为第一个region的stopkey，每行依次类推，最后一行不仅是倒数第二个region的stopkey，同时也是最后一个region的startkey。也就是说分区文件中填的都是key取值范围的分隔点，如下图所示：

http://img.blog.csdn.net/20150605144842477
2.hbase shell中建分区表，指定分区文件

在hbase shell中直接输入create，会看到如下的提示：

[sql]view plain copy

Examples:  

Create a table with namespace=ns1 and table qualifier=t1  

  hbase> create 'ns1:t1', {NAME => 'f1', VERSIONS => 5}  

Create a table with namespace=default and table qualifier=t1  

  hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}  

  hbase> # The above in shorthand would be the following:  

  hbase> create 't1', 'f1', 'f2', 'f3'  

  hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}  

  hbase> create 't1', {NAME => 'f1', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10'}}  

Table configuration options can be put at the end.  

Examples:  

  hbase> create 'ns1:t1', 'f1', SPLITS => ['10', '20', '30', '40']  

  hbase> create 't1', 'f1', SPLITS => ['10', '20', '30', '40']  

  hbase> create 't1', 'f1', SPLITS_FILE => 'splits.txt', OWNER => 'johndoe'  

  hbase> create 't1', {NAME => 'f1', VERSIONS => 5}, METADATA => { 'mykey' => 'myvalue' }  

  hbase> # Optionally pre-split the table into NUMREGIONS, using  

  hbase> # SPLITALGO ("HexStringSplit", "UniformSplit" or classname)  

  hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}  

  hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit', CONFIGURATION => {'hbase.hregion.scan.loadColumnFamiliesOnDemand' => 'true'}}  

  hbase> create 't1', {NAME => 'f1'}, {NAME => 'if1', LOCAL_INDEX=>'COMBINE_INDEX|INDEXED=f1:q1:8|rowKey:rowKey:10,UPDATE=true'}

可以通过指定SPLITS_FILE的值指定分区文件,如果分区信息比较少，也可以直接用SPLITS分区。我们可以通过如下命令建一个分区表，指定第一步中生成的分区文件：

[sql]view plain copy

create 'split_table_test', 'cf', {SPLITS_FILE => 'region_split_info.txt'}

假如我还想对hbase表做一个SNAPPY压缩，应该怎么写呢？

[sql]view plain copy

create 'split_table_test',{NAME =>'cf', COMPRESSION => 'SNAPPY'}, {SPLITS_FILE => 'region_split_info.txt'}

这里注意，一定要将分区的参数指定单独用一个大括号扩起来，因为分区是针对全表，而不是针对某一个column family。

下面，我们登陆一下master的web页面，查看一下hbase的表信息，找到刚刚新建的预分区表，进入查看region信息：

http://img.blog.csdn.net/20150605145011779

我们看到第一个region是没有startkey的，最后一个region是没有stopkey的。

阅读┊ 收藏 ┊ 喜欢 ▼ ┊打印┊举报/Report

前一篇：HBase 数据库检索性能优化策略

后一篇：HBase建立二级索引的一些解决方案，hbase索引解决方案

新浪BLOG意见反馈留言板　欢迎批评指正