Passing Time by ...: HBase

To view the metadata of the 'tables' stored in the HBase, use this command in the <$HBASE_INSTALLATION_DIR>/bin/hbase shell

# hbase(main): xx>scan '.META.'

look at how the tables are distributed over the regionservers from the column=info:server

To put value into a column family without label do this:

# hbase(main): xx>put 'table_name', 'row_key', 'column_family_name:', 'value'

To put value into a column family with a label do this:

# hbase(main): xx>put 'table_name', 'row_key', 'column_family_name:label_name', 'value'

To work with HBase, you have to throw away all SQL concepts. It is just not relational, it is distributed and scalable. One can dynamically add label to a column family as and when required. That means the rows in the 'table' are not of equal length.

Tried HBase (a Google bigtable alike distributed database, is it a database ...?) on my 3 nodes Hadoop cluster.

Download the HBase pacakge (I used version 0.19) in this experiment. I think it only works with Hadoop 0.19 and above.

It is as easy as Hadoop setup. Only a few configuration files to play with:

HBase Configuration Files

<$HBASE_INSTALLATION_DIR>\conf\hbase-env.sh (add JAVA_HOME and HBASE_HOME environment variable)
<$HBASE_INSTALLATION_DIR>\conf\hbase-site.xml
<$HBASE_INSTALLATION_DIR>\conf\regionservers

This is my hbase-site.xml. The hbase.rootdir is where HBase store the data. In this case, in my 3 nodes HDFS. The hbase.master specify where the HBase master server runs and the port it uses. Simple!

<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master.testnet.net:54310/hbase</value>
<description>The directory shared by region servers.</description>
</property>
<property>
<name>hbase.master</name>
<value>master.testnet.net:60000</value>
<description>The host and port that the HBase master runs at.</description>
</property>
</configuration>

This is my regionservers file:

master.testnet.net
slave1.testnet.net
slave2.testnet.net

HBase requires more file handles, so the default 1024 is not enough. To allow for more file handle, edit /etc/security/limits.conf on all nodes and restart your cluster. Please see below:

# Each line describes a limit for a user in the form:
#
# domain type item value
#
hbase - nofile 32768

To start/stop the HBase, use the following command:

<$HBASE_INSTALLATION_DIR>\bin\start-hbase.sh
<$HBASE_INSTALLATION_DIR>\bin\stop-hbase.sh

After the HBase is started, you can interact with the HBase using the HBase Shell. Invoke the HBase shell using this command:

<$HBASE_INSTALLATION_DIR>\bin\hbase shell

Some useful HBase shell command:

create ‘blogposts’, ‘post’, ‘image’

put ‘blogposts’, ‘post1′, ‘post:title’, ‘Hello World’
put ‘blogposts’, ‘post1′, ‘post:author’, ‘The Author’
put ‘blogposts’, ‘post1′, ‘post:body’, ‘This is a blog post’
put ‘blogposts’, ‘post1′, ‘image:header’, ‘image1.jpg’
put ‘blogposts’, ‘post1′, ‘image:bodyimage’, ‘image2.jpg’

get ‘blogposts’, ‘post1′

How do we retrieve the data using Java? See example below.

import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.io.RowResult;

import java.util.HashMap;
import java.util.Map;
import java.io.IOException;

public class HBaseConnector {

public static Map retrievePost(String postId) throws IOException {
HTable table = new HTable(new HBaseConfiguration(), "blogposts");
Map post = new HashMap();

RowResult result = table.getRow(postId);

for (byte[] column : result.keySet()) {
post.put(new String(column), new String(result.get(column).getValue ()));
}

return post;
}

public static void main(String[] args) throws IOException {
Map blogpost = HBaseConnector.retrievePost("post1");
System.out.println(blogpost.get("post:title"));
System.out.println(blogpost.get("post:author"));
}
}

To this point, we are HBased! Amazed, aren't you?

Passing Time by ...

Wednesday, April 22, 2009

More on HBase Part 1

Tuesday, April 21, 2009

Moving on to HBase

Search This Blog

Labels

Followers

Blog Archive