Tried HBase (a Google bigtable alike distributed database, is it a database ...?) on my 3 nodes Hadoop cluster.
Download the HBase pacakge (I used version 0.19) in this experiment. I think it only works with Hadoop 0.19 and above.
It is as easy as Hadoop setup. Only a few configuration files to play with:
HBase Configuration Files
<$HBASE_INSTALLATION_DIR>\conf\hbase-env.sh (add JAVA_HOME and HBASE_HOME environment variable)
<$HBASE_INSTALLATION_DIR>\conf\hbase-site.xml
<$HBASE_INSTALLATION_DIR>\conf\regionservers
This is my
hbase-site.xml. The
hbase.rootdir is where HBase store the data. In this case, in my 3 nodes HDFS. The
hbase.master specify where the HBase master server runs and the port it uses. Simple!
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master.testnet.net:54310/hbase</value>
<description>The directory shared by region servers.</description>
</property>
<property>
<name>hbase.master</name>
<value>master.testnet.net:60000</value>
<description>The host and port that the HBase master runs at.</description>
</property>
</configuration>
This is my regionservers file:
master.testnet.net
slave1.testnet.net
slave2.testnet.net
HBase requires more file handles, so the default 1024 is not enough. To allow for more file handle, edit
/etc/security/limits.conf on all nodes and restart your cluster. Please see below:
# Each line describes a limit for a user in the form:
#
# domain type item value
#
hbase - nofile 32768
To start/stop the HBase, use the following command:
<$HBASE_INSTALLATION_DIR>\bin\start-hbase.sh
<$HBASE_INSTALLATION_DIR>\bin\stop-hbase.sh
After the HBase is started, you can interact with the HBase using the HBase Shell. Invoke the HBase shell using this command:
<$HBASE_INSTALLATION_DIR>\bin\hbase shell
Some useful HBase shell command:
create ‘blogposts’, ‘post’, ‘image’
put ‘blogposts’, ‘post1′, ‘post:title’, ‘Hello World’
put ‘blogposts’, ‘post1′, ‘post:author’, ‘The Author’
put ‘blogposts’, ‘post1′, ‘post:body’, ‘This is a blog post’
put ‘blogposts’, ‘post1′, ‘image:header’, ‘image1.jpg’
put ‘blogposts’, ‘post1′, ‘image:bodyimage’, ‘image2.jpg’
get ‘blogposts’, ‘post1′
How do we retrieve the data using Java? See example below.
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.io.RowResult;
import java.util.HashMap;
import java.util.Map;
import java.io.IOException;
public class HBaseConnector {
public static Map retrievePost(String postId) throws IOException {
HTable table = new HTable(new HBaseConfiguration(), "blogposts");
Map post = new HashMap();
RowResult result = table.getRow(postId);
for (byte[] column : result.keySet()) {
post.put(new String(column), new String(result.get(column).getValue ()));
}
return post;
}
public static void main(String[] args) throws IOException {
Map blogpost = HBaseConnector.retrievePost("post1");
System.out.println(blogpost.get("post:title"));
System.out.println(blogpost.get("post:author"));
}
}
To this point, we are HBased! Amazed, aren't you?