# For Ubuntu 11
sudo apt-add-repository ppa:flexiondotorg/java
sudo apt-get update
sudo apt-get install sun-java6-jdk
# if installation is inside of a VM and behined a proxy
# In addition to configuring proxies, tell sudo to consider the environment with the flag -E
export http_proxy=http://<proxy>:<port>
export https_proxy=http://<proxy>:<port>
sudo -E apt-add-repository ppa:flexiondotorg/java
Check if java is installed java version
sudo addgroup hadoop
sudo adduser -ingroup hadoop ubuntu
We use the default ubuntu:ubuntu user from Amazon EC2 instance in the following.
su - ubuntu
ssh-keygen -t rsa -P ""
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
wget https://archive.apache.org/dist/hadoop/core/hadoop-0.20.205.0/hadoop-0.20.205.0.tar.gz
sudo tar xzf hadoop-0.20.205.0.tar.gz
sudo chown -R ubuntu hadoop-0.20.205.0
sudo mkdir -p /home/ubuntu/myhdfs
sudo chown ubuntu:ubuntu /home/ubuntu/myhdfs
hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-6-sun
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
<configuration> ... </configuration>
tags of inconf/core-site.xml
.<property>
<name>hadoop.tmp.dir</name>
<value>/home/ubuntu/myhdfs</value>
<!-- For default : mkdir -p /tmp/hadoop-username/dfs -->
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
conf/mapred-site.xml
:<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
</property>
conf/hdfs-site.xml
:<property>
<name>dfs.replication</name>
<value>1</value>
</property>
hadoop-0.20.205.0/bin/hadoop namenode –format
hadoop-0.20.205.0/bin/start-all.sh
Using bin/hadoop fsck /
to ckeck if all data nodes are on.
Installing ssh Java6 and Hadoop on the new slave node.
Adding a hadoop user if necessary.
Copying the master’s key to the salve, on the master node types follows
ssh-copy-id -i $HOME/.ssh/id_rsa.pub ubuntu@159.xxx.xxx.xxx
Following configuation with files which are all from the hadoop_home/conf
Download and unzip the Hadoop.
hadoop-env.sh
as we did for the master node.core-site.xml
, hdfs-site.xml
and mapred-site.xml
from the master to the slave node. In this case, there is no custom settings on the new node. The localhost
should be replaced with IP of the master node.Editing the Slaves
file of the master node. Appending the IP address or hostname of the slave node at the end of file.
localhost
159.xxx.xxx.xxx
bin/start-all.sh
. After a moment, the node will be initialized and appeared on the Web admin of the master.Hadoop_Home/conf/slaves
file located on the master node. And making sure that the node is not listed in the exclude file./bin/hadoop-daemon.sh start datanode
and bin/hadoop-daemon.sh start tasktracker
will start the data storage and task tracker processes on the new node, therefore adding it to the cluster.bin/Hadoop dfsadmin –refreshNodes
must be run on the master server. This forces the master to repopulate the list of valid nodes from the slaves and exclude filesDFS Format
bin/hadoop namenode -format # DFS format command
Format aborted in /home/hduser/hadooptmp/dfs/name
11/10/25 04:29:40 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
If the name node is already shutdown, then go to the dfs directory and manually delete all files. After this, input the format command again.
Name node stucks in the safe mode org.apache.hadoop.dfs.SafeModeException
Use the following command to turn off the safe mode.
bin/hadoop dfsadmin -safemode leave
With many hadoop tutorials I had following problems with Hadoop Eclipse plugin.
The possible error messages:
Error:null
Error: Call to localhost/127.0.0.1:54310 failed on local exception: java.io.EOFException
Error: Call to localhost/127.0.0.1:54310 failed on connection exception:
java.net.ConnectException: Connection Refused.
Fix:
Just install the Cygwin, then adding the cygwin path to the Path
of Environment Variables: ;c:\cygwin\bin;c:\cygwin\usr\bin
. Restart the eclipse, the problem will be fixed.
Permission denied: user=xxx\xxxxx, access=WRITE, inode="":hduser:supergroup:rwxr-xr-
Fix:
Changing the permission. This Must be done in the dfs system
hadoop fs -chmod -R ugo+rwx /user
VM image hadoop-appliance-0.18.0.vmx http://developer.yahoo.com/hadoop/tutorial/index.html