We’ve tried loading the system with 10,000 , 100,000 and 200,000 files - everything worked perfectly.
We then moved on to running the full benchmark (11,00,000). After approx 400,000 files , HBase regionServer began to falter :
First of all , it seems that one regionServer (out of 2) was doing almost 90% of the work. Unsurprisingly , it is the one the faltered. (Nothing to do about it , apparently - as long as the META region is under a specific size , it is held on a single RS . After enough data , you’ll have a split , and then the load will be more balanced)
Then , I started to get this error msg :
org.apache.hadoop.hbase.NotServingRegionException</p> The RegionServer is still running , but I’m not sure exactly what is wrong
It began when the RS started a split :
2008-12-16 09:17:14,441 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting split of region</p> HBaseMaster log:
2008-12-16 09:17:16,902 INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: obde_documententries,RQue7uxNoe59vJxljcd1rQ==,122937445
9395: [B@674c5b37 from MYIP</p> after a few split mechanisms issue all kind of info messeges , and everything seems OK , suddenly all the read/write request produce the above error.
Resolution : After viewing several posts on the subject (specially this one) , I tried to do the obvious thing and disable/enable the table. I’m not sure it resolved the problem or just delayed it. This solution solves the current problem , but it keeps happening again and again, I’m not sure why.