To check the file information on a file, do this
<$HADOOP_INSTALLATION_DIR>/bin/hadoop fsck
To change the replication factor of a file, do this
<$HADOOP_INSTALLATION_DIR>/bin/hadoop fs -setrep [-R]
the -R is for recursive for a directory
To start a NameNode/JobTracker on a node, do this
<$HADOOP_INSTALLATION_DIR>/bin/hadoop namenode
<$HADOOP_INSTALLATION_DIR>/bin/hadoop jobtracker
To start a DataNode/TaskTracker on a slave node, do this
<$HADOOP_INSTALLATION_DIR>/bin/hadoop datanode
<$HADOOP_INSTALLATION_DIR>/bin/hadoop tasktracker
To rebalance the block replication in a cluster, do this
<$HADOOP_INSTALLATION_DIR>/bin/hadoop balancer
Hadoop works in a "rack"-aware context, i.e. it assumed that nodes are a subset of a rack and a deployment will have multiple racks. This explained the policy of dfs.replication = 3 stating 'one replica on a node in the rack, another replica on a different node in the same rack, and the third on a different node in a different rack'. If not specify, the rackid is 'defaultrack'. Hadoop lets the cluster administrators decide which rack a node belongs to through configuration variable dfs.network.script. When this script is configured, each node runs the script to determine its rackid. See Hadoop JIRA HADOOP-692. Some reference material here: Rack_aware_HDFS_proposal.pdf
That's all for now. Continue to Hadoop ...
0 comments:
Post a Comment