Tuesday, April 21, 2009

Moving on to HBase

Tried HBase (a Google bigtable alike distributed database, is it a database ...?) on my 3 nodes Hadoop cluster.

Download the HBase pacakge (I used version 0.19) in this experiment. I think it only works with Hadoop 0.19 and above.

It is as easy as Hadoop setup. Only a few configuration files to play with:

HBase Configuration Files


<$HBASE_INSTALLATION_DIR>\conf\hbase-env.sh (add JAVA_HOME and HBASE_HOME environment variable)
<$HBASE_INSTALLATION_DIR>\conf\hbase-site.xml
<$HBASE_INSTALLATION_DIR>\conf\regionservers


This is my hbase-site.xml. The hbase.rootdir is where HBase store the data. In this case, in my 3 nodes HDFS. The hbase.master specify where the HBase master server runs and the port it uses. Simple!

<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master.testnet.net:54310/hbase</value>
<description>The directory shared by region servers.</description>
</property>
<property>
<name>hbase.master</name>
<value>master.testnet.net:60000</value>
<description>The host and port that the HBase master runs at.</description>
</property>
</configuration>

This is my regionservers file:


master.testnet.net
slave1.testnet.net
slave2.testnet.net



HBase requires more file handles, so the default 1024 is not enough. To allow for more file handle, edit /etc/security/limits.conf on all nodes and restart your cluster. Please see below:



# Each line describes a limit for a user in the form:
#
# domain type item value
#
hbase - nofile 32768


To start/stop the HBase, use the following command:


<$HBASE_INSTALLATION_DIR>\bin\start-hbase.sh
<$HBASE_INSTALLATION_DIR>\bin\stop-hbase.sh


After the HBase is started, you can interact with the HBase using the HBase Shell. Invoke the HBase shell using this command:


<$HBASE_INSTALLATION_DIR>\bin\hbase shell


Some useful HBase shell command:


create ‘blogposts’, ‘post’, ‘image’

put ‘blogposts’, ‘post1′, ‘post:title’, ‘Hello World’
put ‘blogposts’, ‘post1′, ‘post:author’, ‘The Author’
put ‘blogposts’, ‘post1′, ‘post:body’, ‘This is a blog post’
put ‘blogposts’, ‘post1′, ‘image:header’, ‘image1.jpg’
put ‘blogposts’, ‘post1′, ‘image:bodyimage’, ‘image2.jpg’

get ‘blogposts’, ‘post1′


How do we retrieve the data using Java? See example below.

import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.io.RowResult;

import java.util.HashMap;
import java.util.Map;
import java.io.IOException;

public class HBaseConnector {

public static Map retrievePost(String postId) throws IOException {
HTable table = new HTable(new HBaseConfiguration(), "blogposts");
Map post = new HashMap();

RowResult result = table.getRow(postId);

for (byte[] column : result.keySet()) {
post.put(new String(column), new String(result.get(column).getValue ()));
}

return post;
}

public static void main(String[] args) throws IOException {
Map blogpost = HBaseConnector.retrievePost("post1");
System.out.println(blogpost.get("post:title"));
System.out.println(blogpost.get("post:author"));
}
}

To this point, we are HBased! Amazed, aren't you?

0 comments:

Post a Comment