Sunday, April 19, 2009

Getting Hadoop'ed

This is the official site of Apache Hadoop!.

I started off with the Hadoop Core latest available beta version 0.19.1. The latest stable version is 0.18.3 at the time of this experiment.

I have tried setting up on a Windows Vista box at home thinking that it should be easier, but it is not. For Hadoop to run on Windows, you need cygwin and got tons of access control issues on Vista. Have not gotten it to run YET!

Concurrently tried it on Fedora 9 on a VM, and it works like a breeze (after a while).

A few configuration files to play with:


/conf/hadoop-env.sh (set JAVA_HOME environment variable)
/conf/hadoop-default.xml
/conf/hadoop-site.xml
/conf/master
/conf/slaves


The following commands will come in handy.

SSH:

To check sshd (SSH Daemon) is running:
$chkconfig --list sshd

To set sshd to run at level 2345:
$chkconfig --level 2345 sshd on

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

On Fedora 11, you will need to activate the sshd to allow host key authentication in the SELinux Management. Else ssh will still prompt for password even with the above keygen configuration.

Check that the authorized_keys permission is as follows:
$ chmod 644 ~/.ssh/authorized_keys




Firewall (iptables):


$ /etc/init.d/iptables save

$ /etc/init.d/iptables stop

$ /etc/init.d/iptables start


Java:


$ /usr/sbin/alternatives --install /usr/bin/java java /usr/java/jdk1.6.014/bin/java 2

$ /usr/sbin/alternatives --config java


Hadoop:

$ /bin/start-dfs.sh

$ /bin/start-mapred.sh

$ /bin/start-all.sh

$ /bin/hadoop namenode -format

$ /bin/hadoop fs -put

$ /bin/hadoop fs -get

$ /bin/hadoop fs -cat


Hadoop Web Interface


NameNode - http://localhost:50070/

JobTracker - http://localhost:50030/

0 comments:

Post a Comment