Sizing a machine for HBase is somewhat of a black art. Unlike a pure storage machine that would just be optimized for disk size and throughput, an HBase RegionServer is also a compute node. Every byte of disk space needs to be matched with
Sizing a machine for HBase is somewhat of a black art.Unlike a pure storage machine that would just be optimized for disk size and throughput, an HBase RegionServer is also a compute node.
Every byte of disk space needs to be matched with a fraction of a byte in the RegionServer's Java heap.
You can estimate the ratio of raw disk space to required Java heap as follows:
RegionSize / MemstoreSize *
ReplicationFactor * HeapFractionForMemstores
Or in terms of HBase/HDFS configuration parameters:
regions.hbase.hregion.max.filesize /
hbase.hregion.memstore.flush.size *
dfs.replication *
hbase.regionserver.global.memstore.lowerLimit
Say you have the following parameters (these are the defaults in 0.94):
Now think about this. With the default setting this means that if you wanted to serve 10T worth of disks space per region server you would need a 107GB Java heap!
Or if you give a region server a 10G heap you can only utilize about 1T of disk space per region server machine.
Most people are surprised by this. I know I was.
Let's double check:
In order to serve 10T worth of raw disk space - 3.3T of effective space after 3-way replication - with 10GB regions, you'd need ~338 regions. @128MB that's about 43GB. But only 40% is by default used for the memstores so what you actually need is 43GB/0.4 ~ 107GB. Yep it's right.
Maybe we can get away with a bit less by assuming that not all memstores are 100% full at all times. That is offset by the fact that not all region will be exactly the same size or 100% filled.
Now. What can you do?
There are several options:
Personally I would place the maximum disk space per machine that can be served exclusively with HBase around 6T, unless you have a very read-heavy workload.
In that case the Java heap should be 32GB (20G regions, 128M memstores, the rest defaults). With MSLAB in 0.94 that works.
Of course your needs may vary. You may have mostly readonly load, in which case you can shrink the memstores. Or the disk space might be shared with other applications.
Maybe you need smaller regions or larger memstores. In that case he maximum disk space you can serve per machine would be less.
Future JVMs might support bigger heap effectively (JDK7's G1 comes to mind).
In any case. The formula above provides a reasonable starting point.
原文地址:HBase region server memory sizing, 感谢原作者分享。