Showing posts with label HBase. Show all posts
Showing posts with label HBase. Show all posts

Wednesday, April 22, 2009

More on HBase Part 1

To view the metadata of the 'tables' stored in the HBase, use this command in the <$HBASE_INSTALLATION_DIR>/bin/hbase shell


# hbase(main): xx>scan '.META.'



look at how the tables are distributed over the regionservers from the column=info:server

To put value into a column family without label do this:

# hbase(main): xx>put 'table_name', 'row_key', 'column_family_name:', 'value'

To put value into a column family with a label do this:

# hbase(main): xx>put 'table_name', 'row_key', 'column_family_name:label_name', 'value'

To work with HBase, you have to throw away all SQL concepts. It is just not relational, it is distributed and scalable. One can dynamically add label to a column family as and when required. That means the rows in the 'table' are not of equal length.

Tuesday, April 21, 2009

Moving on to HBase

Tried HBase (a Google bigtable alike distributed database, is it a database ...?) on my 3 nodes Hadoop cluster.

Download the HBase pacakge (I used version 0.19) in this experiment. I think it only works with Hadoop 0.19 and above.

It is as easy as Hadoop setup. Only a few configuration files to play with:

HBase Configuration Files


<$HBASE_INSTALLATION_DIR>\conf\hbase-env.sh (add JAVA_HOME and HBASE_HOME environment variable)
<$HBASE_INSTALLATION_DIR>\conf\hbase-site.xml
<$HBASE_INSTALLATION_DIR>\conf\regionservers


This is my hbase-site.xml. The hbase.rootdir is where HBase store the data. In this case, in my 3 nodes HDFS. The hbase.master specify where the HBase master server runs and the port it uses. Simple!

<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master.testnet.net:54310/hbase</value>
<description>The directory shared by region servers.</description>
</property>
<property>
<name>hbase.master</name>
<value>master.testnet.net:60000</value>
<description>The host and port that the HBase master runs at.</description>
</property>
</configuration>

This is my regionservers file:


master.testnet.net
slave1.testnet.net
slave2.testnet.net



HBase requires more file handles, so the default 1024 is not enough. To allow for more file handle, edit /etc/security/limits.conf on all nodes and restart your cluster. Please see below:



# Each line describes a limit for a user in the form:
#
# domain type item value
#
hbase - nofile 32768


To start/stop the HBase, use the following command:


<$HBASE_INSTALLATION_DIR>\bin\start-hbase.sh
<$HBASE_INSTALLATION_DIR>\bin\stop-hbase.sh


After the HBase is started, you can interact with the HBase using the HBase Shell. Invoke the HBase shell using this command:


<$HBASE_INSTALLATION_DIR>\bin\hbase shell


Some useful HBase shell command:


create ‘blogposts’, ‘post’, ‘image’

put ‘blogposts’, ‘post1′, ‘post:title’, ‘Hello World’
put ‘blogposts’, ‘post1′, ‘post:author’, ‘The Author’
put ‘blogposts’, ‘post1′, ‘post:body’, ‘This is a blog post’
put ‘blogposts’, ‘post1′, ‘image:header’, ‘image1.jpg’
put ‘blogposts’, ‘post1′, ‘image:bodyimage’, ‘image2.jpg’

get ‘blogposts’, ‘post1′


How do we retrieve the data using Java? See example below.

import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.io.RowResult;

import java.util.HashMap;
import java.util.Map;
import java.io.IOException;

public class HBaseConnector {

public static Map retrievePost(String postId) throws IOException {
HTable table = new HTable(new HBaseConfiguration(), "blogposts");
Map post = new HashMap();

RowResult result = table.getRow(postId);

for (byte[] column : result.keySet()) {
post.put(new String(column), new String(result.get(column).getValue ()));
}

return post;
}

public static void main(String[] args) throws IOException {
Map blogpost = HBaseConnector.retrievePost("post1");
System.out.println(blogpost.get("post:title"));
System.out.println(blogpost.get("post:author"));
}
}

To this point, we are HBased! Amazed, aren't you?