Friday, January 27, 2017

hadoop: installation on a single node


Prerequisites

Java installation

$ sudo apt-get install python-software-properties
$ sudo add-apt-repository ppa:webup8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer

JDK is installed at /usr/lib/jvm/java-6-sun


Adding a dedicated Hadoop system user

$ sudo addgroup hadoop $ sudo adduser --ingroup hadoop hduser

this will add the user hduser and the group hadoop to your local machine.

Configuring SSH


$sudo apt-get install ssh


user@ubuntu:~$ su - hduser hduser@ubuntu:~$ ssh-keygen -t rsa -P "" Generating public/private rsa key pair. Enter file in which to save the key (/home/hduser/.ssh/id_rsa): Created directory '/home/hduser/.ssh'. Your identification has been saved in /home/hduser/.ssh/id_rsa. Your public key has been saved in /home/hduser/.ssh/id_rsa.pub. The key fingerprint is: 9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2 hduser@ubuntu The key's randomart image is: [...snipp...] hduser@ubuntu:~$


hduser@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys


hduser@ubuntu:~$ ssh localhost The authenticity of host 'localhost (::1)' can't be established. RSA key fingerprint is d7:87:25:47:ae:02:00:eb:1d:75:4f:bb:44:f9:36:26. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'localhost' (RSA) to the list of known hosts. Linux ubuntu 2.6.32-22-generic #33-Ubuntu SMP Wed Apr 28 13:27:30 UTC 2010 i686 GNU/Linux Ubuntu 10.04 LTS [...snipp...] hduser@ubuntu:~$



Disabling IPv6


To disable IPv6 on Ubuntu , open /etc/sysctl.conf in the editor and add the following lines to the end of the file:


# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1


You have to reboot your machine in order to make the changes take effect.

Hadoop Installation

download hadoop and extract at /usr/local

$ cd /usr/local
 $ sudo tar xzf hadoop-2.7.3.tar.gz
 $ sudo mv hadoop-2.7.3 hadoop
$ sudo chown -R hduser:hadoop hadoop


Update $HOME/.bashrc

as root

$ sudo gedit /home/hdsuser/.bashrc and make the following changes


# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME=/usr/lib/jvm/java-6-sun
export PATH=$PATH:/usr/local/sbin


Configuration
as root
hadoop-env.sh

$ sudo gedit /usr/local/hadoop/etc/hadoop/hadoop-env.sh and make the following changes

export JAVA_HOME=/usr/lib/jvm/java-8-oracle from Java_HOME =${Java_HOME}

core-site.xml

use the following ports for the different services:
54310: HDFS.
54311: MapReduce Job tracker.

Run the following commands to create the directory where Hadoop will store the data files (home/hduser/tmp):


sudo mkdir /home/hduser/tmp
sudo chown hduser:hadoop /home/hduser/tmp
sudo chmod 750 /home/hduser/tmp


$sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml


Add the following lines between the <configuration> and </configuration> tags:


<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/hadoop/tmp</value>
<description>The base for other Hadoop temporary directories and files.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The URI of the default file system.
</description>
</property>



mapred-site.xml


$sudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml

Add the following lines between the <configuration> and </configuration> tags (see Figure 6):

<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port where the MapReduce job tracker runs at.</description>
</property>



hdfs-site.xml


$sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml

Add the following lines between the <configuration> and </configuration> tags:


<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.</description>
</property>


Format Namenode

$ hdfs namenode -format


Start Hadoop Service


$ start-dfs.sh

....

$ start-yarn.sh
....

$ jps

If everything is sucessful, you should see following services running
2583 DataNode
2970 ResourceManager
3461 Jps
3177 NodeManager
2361 NameNode
2840 SecondaryNameNode


Run Hadoop Example

hduser@ubuntu: cd /usr/local/hadoop

hduser@ubuntu:/usr/local/hadoop$ bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5

Browse the web interface

Browse the web interface for the ResourceManager; by default it is available at:
ResourceManager - http://localhost:8088/

set hduser password
hello@123
Reference Links
1 2  3


No comments:

Post a Comment