Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
Former Member

To Access to Hive from HANA, we first should have Hadoop and Hive installed. In the first section and the second section, installation of Hadoop and hive will be introduced. 


1. Download Hadoop and move to directory

Download Hadoop from apache Hadoop mirror: http://hadoop.apache.org/releases.html#Download

In this case, we choose Hadoop-2.2.0.

Unzip the downloaded Hadoop package and put the Hadoop fold to directory where you want it to be installed.

tar -zxvf  hadoop-2.2.0.tar.gz

Switch to your Hana server user:

su hana_user_name.

We need to install Hadoop under Hana user, because Hana server needs to communicate with Hadoop with the same user.

If you just want to set up Hadoop without accessing from Hana, you can simply create a dedicate Hadoop account by “addgroup” and “adduser” (these two command lines depend on system, Suse and Ubuntu seem to have different command lines)

2. Check Java

Before we install the Hadoop, we should make we have Java installed.

Use:

java –version 

to check java and find java path by

whereis java

And write the following script in $HOME/.bashrc to add your java path:

export JAVA_HOME=/java/path/

export PATH=$PATH:/java/path/


3. SSH passwordless

Install ssh first if you don’t have.

Type the following commands in console to create a public key and put the key to authorized keys

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

4. Add path to Hadoop

Write the following script in $HOME/.bashrc if you want to add the Hadoop path permanently.

Open the .bashrc file by

vi $HOME/.bashrc

Add the following script

export HADOOP_INSTALL=/hadoop/path/

For the hadoop path, I put the Hadoop folder under /usr/local,

so I use /usr/local/hadoop instead of /hadoop/path/ in my case

export PATH=$PATH:$HADOOP_INSTALL/bin

export PATH=$PATH:$HADOOP_INSTALL/sbin

5. Hadoop configuration

Find the configuration files, core-site.xml, hdfs-core.xml, yarn-site.xml, mapred-site.xml, hadoop-env.sh in Hadoop folder. These files exist in $ HADOOP_INSTALL /etc/hadoop/ under you Hadoop folder. You may simply rename the “template file” in the folder if you can find the xml files. For example:

cp mapred-site.xml.template mapred-site.xml

Some other tutorials said you can find them under /conf/ directory, I guess /conf/ is for older Hadoop version, but in hadoop-2.2.0 the files are under /etc/hadoop/

Modify the configuration files as followed:

vi core-site.xml

Put the following between configuration tab

<property>

<name>fs.default.name</name>

<value>hdfs://computer name or IP(localhost would also work):8020</value>

</property>

vi hdfs-site.xml

Put the following between configuration tab

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>file:/namenode/dir</value>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>file:/datanode/dir</value>

</property>

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

vi yarn-site.xml

Put the following between configuration tab

<property>

<name>yarn.resourcemanager.hostname</name>

<value>yourcomputername or IP</value>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

    <value>mapreduce_shuffle</value>

  </property>

vi mapred-site.xml

Put the following between configuration tab

<property>

<name>mapreduce.framework.name</name>

   <value>yarn</value>

</property>

For more information about all the tabs, please check

http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/core-default.xml

http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

http://hadoop.apache.org/docs/r2.2.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-def...

http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

vi hadoop-env.sh

add the following two statement in the end of this file

export HADOOP_COMMON_LIB_NATIVE_DIR=/hadoop/path/lib/native

export HADOOP_OPTS="-Djava.library.path=/hadoop/path /lib"

6. Start Hadoop

       The last thing needs to do before starting your Hadoop is to format your namenode and datanode simply by:

            hadoop namenode -format

In the end, you can start Hadoop by calling “start-all.sh”, you may find this file in /hadoop/path/sbin.

To check your Hadoop has started, type

jps

You should see NameNode, NodeManager, DataNode, SecondaryNameNode and ResourceManager are running.

Alternatively, you can also check if Hadoop is running by visiting localhost:50070 to check Hadoop file system information


and localhost:8088 to check cluster information.


You may find that localhost:50030 contains jobtracker info in some tutorials. However, localhost:50030 does not exist in hadoop-2.2.0, because hadoop-2.2.0 divides the two major functions of the JobTracker: resource management and job life-cycle management into separate components. Don’t worry about localhost:50030 not working.

1 Comment