HADOOP installation made easy

Mithun_K · ‎04-02-2014

Using SAP HANA we can connect to HADOOP using Smart Data Access where the first thing which we need to do is the HADOOP installation.

This blog talks about the HADOOP installation.

It takes at the max 2 hours for the installation if you are lucky :smile:

Please follow the below steps:

Step-1:

1. Download a stable release ending with tar.gz (hadoop-1.2.1.tar.gz)

2. In Linux, create a new folder “/home/hadoop”

3. Move the downloaded file to the folder “/home/hadoop” using Winscp or Filezilla.

4. In putty type: cd /home/hadoop

5. Type: tar xvf hadoop-1.2.1.tar.gz

Step-2:

Downloading and setting up java:

1.Check if Java is present

Type: java –version

2. If java is not present, please install it by following the below steps

3. Make a directory where we can install Java (/usr/local/java)

4. Download 64-bit Linux Java JDK and JRE ending with tar.gz from the below link:

http://oracle.com/technetwork/java/javase/downloads/index.html

5. Copy the downloaded files to the created folder

6. Extract and install java:

Type: cd /usr/local/java

Type: tar xvzf jdk.*.tar.gz

Type: tar xvzf jre.*.tar.gz

7. Include all the variables for path and Home directories in the /etc/profile at the end of file

JAVA_HOME=/usr/local/java/jdk1.7.0_40

PATH=$PATH:$JAVA_HOME/bin

JRE_HOME=/usr/local/java/jre1.7.0_40

PATH=$PATH:$JRE_HOME/bin

HADOOP_INSTALL=/home/hadoop/hadoop-1.2.1

PATH=$PATH:$ HADOOP_INSTALL /bin

Export JAVA_HOME

Export JRE_HOME

Export HADOOP_INSTALL

8. Run the below commands so that Linux can understand where Java is installed:

sudo update-alternatives --install "/usr/bin/java" "java" "/usr/local/java/jre1.7.0_40/bin/java" 1

sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/local/java/jdk1.7.0_40/bin/javac" 1

sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/usr/local/java/jre1.7.0_40/bin/javaws" 1

sudo update-alternatives –set java /usr/local/java/ jre1.7.0_40/bin/java

sudo update-alternatives –set javac /usr/local/java/jdk1.7.0_40/bin/javac

sudo update-alternatives –set javaws /usr/local/java/jre1.7.0_40/bin/javaws

9. Test Java by typing Java –version

10. Check if JAVA_HOME is set by typing: echo $JAVA_HOME

Now we are done with the installation of Hadoop (Stand alone mode).

Step-3:

We can check if we are successful by running an example.

Go to Hadoop Installation directory

Type: mkdir output

Type: bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+’

Type: ls output/*

The output is displayed with the success.

Step-4:

As a next step, change the configuration in the below files:

1. In the Hadoop installation folder change /conf/core-site.xml file to:

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>

2. Change /conf/hdfs-site.xml:

<name>dfs.replication</name>

</property>

</configuration>

3. Change /conf/mapred-site.xml:

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

</configuration>

4. Edit /conf/hadoop-env.sh file:

export JAVA_HOME=/usr/local/java/ jdk1.7.0_40

Step-5:

1. Setup password less ssh by running the below commands:

Type: ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

Type: cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

2. To check if the ssh password is disabled

Type: ssh localhost (It should not ask any password)

3. Format the name node:

Type: /bin/hadoop namenode –format

Step-6:

To start all the Hadoop services:

Type: /bin/start-all.sh

Now try the same example which we tried earlier:

Type: bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+’

It should give the output.

To stop all the Hadoop services:

Type: /bin/stop-all.sh

Now the installation of HADOOP is successful. :smile:

HADOOP installation made easy

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win