Prerequisites
Java installation
$ sudo apt-get install python-software-properties
$ sudo add-apt-repository ppa:webup8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer
JDK is installed at /usr/lib/jvm/java-6-sun
Adding a dedicated Hadoop system user
$ sudo addgroup hadoop $ sudo adduser --ingroup hadoop hduser
this will add the user hduser and the group hadoop to your local machine.
Configuring SSH
$sudo apt-get install ssh
user@ubuntu:~$ su - hduser hduser@ubuntu:~$ ssh-keygen -t rsa -P "" Generating public/private rsa key pair. Enter file in which to save the key (/home/hduser/.ssh/id_rsa): Created directory '/home/hduser/.ssh'. Your identification has been saved in /home/hduser/.ssh/id_rsa. Your public key has been saved in /home/hduser/.ssh/id_rsa.pub. The key fingerprint is: 9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2 hduser@ubuntu The key's randomart image is: [...snipp...] hduser@ubuntu:~$
hduser@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
hduser@ubuntu:~$ ssh localhost The authenticity of host 'localhost (::1)' can't be established. RSA key fingerprint is d7:87:25:47:ae:02:00:eb:1d:75:4f:bb:44:f9:36:26. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'localhost' (RSA) to the list of known hosts. Linux ubuntu 2.6.32-22-generic #33-Ubuntu SMP Wed Apr 28 13:27:30 UTC 2010 i686 GNU/Linux Ubuntu 10.04 LTS [...snipp...] hduser@ubuntu:~$
Disabling IPv6
To disable IPv6 on Ubuntu , open /etc/sysctl.conf in the editor and add the following lines to the end of the file:
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
You have to reboot your machine in order to make the changes take effect.
Hadoop Installation
download hadoop and extract at /usr/local
$ cd /usr/local
$ sudo tar xzf hadoop-2.7.3.tar.gz
$ sudo mv hadoop-2.7.3 hadoop
$ sudo chown -R hduser:hadoop hadoop
Update $HOME/.bashrc
as root
$ sudo gedit /home/hdsuser/.bashrc and make the following changes
# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME=/usr/lib/jvm/java-6-sun
export PATH=$PATH:/usr/local/sbin
Configuration
as root
hadoop-env.sh
$ sudo gedit /usr/local/hadoop/etc/hadoop/hadoop-env.sh and make the following changes
export JAVA_HOME=/usr/lib/jvm/java-8-oracle from Java_HOME =${Java_HOME}
core-site.xml
use the following ports for the different services:
54310: HDFS.
54311: MapReduce Job tracker.
Run the following commands to create the directory where Hadoop will store the data files (home/hduser/tmp):
sudo mkdir /home/hduser/tmp
sudo chown hduser:hadoop /home/hduser/tmp
sudo chmod 750 /home/hduser/tmp
$sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml
Add the following lines between the <configuration> and </configuration> tags:
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/hadoop/tmp</value>
<description>The base for other Hadoop temporary directories and files.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The URI of the default file system.
</description>
</property>
mapred-site.xml
$sudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml
Add the following lines between the <configuration> and </configuration> tags (see Figure 6):
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port where the MapReduce job tracker runs at.</description>
</property>
hdfs-site.xml
$sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml
Add the following lines between the <configuration> and </configuration> tags:
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.</description>
</property>
Format Namenode
$ hdfs namenode -format
Start Hadoop Service
$ start-dfs.sh
....
$ start-yarn.sh
....
$ jps
If everything is sucessful, you should see following services running
2583 DataNode
2970 ResourceManager
3461 Jps
3177 NodeManager
2361 NameNode
2840 SecondaryNameNode
Run Hadoop Example
hduser@ubuntu: cd /usr/local/hadoop
hduser@ubuntu:/usr/local/hadoop$ bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5
Browse the web interface
Browse the web interface for the ResourceManager; by default it is available at:
ResourceManager - http://localhost:8088/
No comments:
Post a Comment