最新文章专题视频专题问答1问答10问答100问答1000问答2000关键字专题1关键字专题50关键字专题500关键字专题1500TAG最新视频文章推荐1 推荐3 推荐5 推荐7 推荐9 推荐11 推荐13 推荐15 推荐17 推荐19 推荐21 推荐23 推荐25 推荐27 推荐29 推荐31 推荐33 推荐35 推荐37视频文章20视频文章30视频文章40视频文章50视频文章60 视频文章70视频文章80视频文章90视频文章100视频文章120视频文章140 视频2关键字专题关键字专题tag2tag3文章专题文章专题2文章索引1文章索引2文章索引3文章索引4文章索引5123456789101112131415文章专题3
当前位置: 首页 - 科技 - 知识百科 - 正文

HBaseregionservermemorysizing

来源:动视网 责编:小采 时间:2020-11-09 13:24:25
文档

HBaseregionservermemorysizing

HBaseregionservermemorysizing:Sizing a machine for HBase is somewhat of a black art. Unlike a pure storage machine that would just be optimized for disk size and throughput, an HBase RegionServer is also a compute node. Every byte of disk space needs to be matched with
推荐度:
导读HBaseregionservermemorysizing:Sizing a machine for HBase is somewhat of a black art. Unlike a pure storage machine that would just be optimized for disk size and throughput, an HBase RegionServer is also a compute node. Every byte of disk space needs to be matched with


Sizing a machine for HBase is somewhat of a black art. Unlike a pure storage machine that would just be optimized for disk size and throughput, an HBase RegionServer is also a compute node. Every byte of disk space needs to be matched with

Sizing a machine for HBase is somewhat of a black art.

Unlike a pure storage machine that would just be optimized for disk size and throughput, an HBase RegionServer is also a compute node.

Every byte of disk space needs to be matched with a fraction of a byte in the RegionServer's Java heap.

You can estimate the ratio of raw disk space to required Java heap as follows:

RegionSize / MemstoreSize *
ReplicationFactor * HeapFractionForMemstores

Or in terms of HBase/HDFS configuration parameters:

regions.hbase.hregion.max.filesize /
hbase.hregion.memstore.flush.size *
dfs.replication *
hbase.regionserver.global.memstore.lowerLimit

Say you have the following parameters (these are the defaults in 0.94):

  • 10GB regions
  • 128MB memstores
  • HDFS replication factor of 3
  • 40% of the heap use for the memstores

  • Then: 10GB/128MB*3*0.4 = 96.

    Now think about this. With the default setting this means that if you wanted to serve 10T worth of disks space per region server you would need a 107GB Java heap!
    Or if you give a region server a 10G heap you can only utilize about 1T of disk space per region server machine.

    Most people are surprised by this. I know I was.

    Let's double check:
    In order to serve 10T worth of raw disk space - 3.3T of effective space after 3-way replication - with 10GB regions, you'd need ~338 regions. @128MB that's about 43GB. But only 40% is by default used for the memstores so what you actually need is 43GB/0.4 ~ 107GB. Yep it's right.

    Maybe we can get away with a bit less by assuming that not all memstores are 100% full at all times. That is offset by the fact that not all region will be exactly the same size or 100% filled.

    Now. What can you do?
    There are several options:

    1. Increase the region size. 20GB is about the maximum. Although some people claim they have 200GB regions. (hbase.hregion.max.filesize)
    2. Decrease the memstore size. Depending on your write load you can go smaller, 64MB or even less. (hbase.hregion.memstore.flush.size).
      You can allow a memstore to grow beyond this size temporarily. (hbase.hregion.memstore.block.multiplier)
    3. Increase the HDFS replication factor. That does not really help per se, but if you have more disk space than you can utilize, increasing the replication factor would at least put your disks to good use.
    4. Fiddle with the heap fractions used for the memstores. If you load is write-heave maybe up that 50% of the heap (hbase.regionserver.global.memstore.upperLimit, hbase.regionserver.global.memstore.lowerLimit)
    These parameters (except the replication factor, which is an HDFS setting) are described in hbase-defaults.xml that ships with HBase.

    Personally I would place the maximum disk space per machine that can be served exclusively with HBase around 6T, unless you have a very read-heavy workload.
    In that case the Java heap should be 32GB (20G regions, 128M memstores, the rest defaults). With MSLAB in 0.94 that works.

    Of course your needs may vary. You may have mostly readonly load, in which case you can shrink the memstores. Or the disk space might be shared with other applications.
    Maybe you need smaller regions or larger memstores. In that case he maximum disk space you can serve per machine would be less.

    Future JVMs might support bigger heap effectively (JDK7's G1 comes to mind).

    In any case. The formula above provides a reasonable starting point.

    文档

    HBaseregionservermemorysizing

    HBaseregionservermemorysizing:Sizing a machine for HBase is somewhat of a black art. Unlike a pure storage machine that would just be optimized for disk size and throughput, an HBase RegionServer is also a compute node. Every byte of disk space needs to be matched with
    推荐度:
    标签: server memory size
    • 热门焦点

    最新推荐

    猜你喜欢

    热门推荐

    专题
    Top