This is a companion to my earlier blog, where I demonstrated HADOOP HBASE records being read by HANA and presented using SAPUI5.

Tip of the iceberg: Using Hana with Hadoop Hbase


In this blog I expand on this topic further by enabling HANA to read and write to HBASE.

I’ve created a simple application where HBASE is used to capture the change log of a HANA table.

But why, you might say?

HADOOP is designed for BIG Data (e.g. at the Petabyte scale), at it’s core it uses disks for storage. It’s still very much a ‘do it yourself’ technology with very little Enterprise ready applications built on top yet.

HANA is arguably designed for LARGE Data (e.g. at  the Terabyte scale) and makes use of the latest In-Memory technology. By contrast it has a huge catalog of SAP software now running on it.

The two technologies are complementary in an Enterprise Data Architecture.

Many modern website (e.g. Facebook, Twitter, Linkedin) use a complex combination of HADOOP, traditional RDBMS, custom development and a wide collection of website scripting tools.  The aspiration for many of these websites is millions or billions of users.

For Enterprises, who don’t consider IT their core business, then venturing down this path may not be the most straightforward option. HANA is more that just a database, it also provides the platform for building and deploying custom Enterprise Applications for Desktop and Mobile. In this case a far simpler BIG Data Architecture might be to use just HANA & HADOOP. Enterprise applications may be only targeted to 100’s to 10K’s of users. 

In the following example I’ve built a very simple application on HANA, which enables a table to be maintained on a HTML/5 webpage.  For audit and control purposes I wanted to keep a log of all the changes to Column values by the users.  I could have implemented this purely in HANA, but to demonstrate the integration potential of HANA and HADOOP HBASE, I have opted to also write the changes to HBASE.  On a small scale the benefit of this is negligible, but on a larger scale there may considerable cost saving for storing low value data on an open source disk based solution such as HADOOP.

The following diagram represents the technical data flow of my example:


Now for an example.

The following shows the DOCUMENTS table in HANA, and HBASE Table Log prior to a change:

In HBASE the change log ( for Doc_Name DOC1 & Row 1)  is stored as:

Now  I make a change, e.g. Changing the Free Text from ‘APPLE’ to ‘APPLE CIDER’

Update Successful!  (Changes written to HANA and HBASE)

From SAPUI5 the HBASE Change Log appears as:

Above you can now see the history of the of the 'Free Text' field

In HBASE the Change log Table appears as:

NOTE:  the HADOOP User Interface (HUE) only the latest change is shown, however behind the scenes I’ve defined the HBASE table to store up to 100 changes (versions) of the an individual column.

I can also check these in Hbase Stargate directly, though they are BASE64 encoded:

OR by checking the HbaseTableLog xsjs I created, using the GET Method (which decodes):

NOTE: I used POSTMAN here to help format the returned JSON to make a bit easier to test. Above you’ll see the history for the Free_Text field.

The key feature to get this prototype working is the HANA SPS07 functionaility which enables XS Javacript libraries (xsjslib)  to be called  on ODATA CRUD  events.

E.g. DOCUMENTS.xsodata

service {


     update using "HanaHbase:DOCUMENTS_table_exits.xsjslib::update_instead"



NOTE:  For the comparison of Before and After record, to determine the field changed, I’ve made use of the example code provided by Thomas Jung , for convert a SQL record set to a JSON object, see

During the ODATA PUT (UPDATE) I’ve modified both HANA & HBASE with the most recent change.

The HANA table is only setup to store the current value.

The HBASE equivalent table  I’ve defined to keep the most recent 100 changes per COLUMN.

The HBASE Log table was defined as:

create 'HanaTableLog', {NAME => 'table', VERSIONS => 100}

The complete code for this prototype is available on github.

If you are modelling HANA views based on a Suite (like ERP, CRM, SRM) on HANA system, you would probably like to have the table and table field descriptions from the ABAP Dictionary available to you in the HANA Studio Modeler/Editor. Once this ABAP report has been ran the table and column description are shown in the HANA Studio editor and automatically copied to the semantic layer:



Run the following ABAP report in the background to update the descriptions in the HANA database based on the ABAP Dictionary. Only works on NW 7.4 (ERP Ehp7 or BW 7.4 or CRM 7 Ehp 3) system:


REPORT z_hana_set_table_comments.
DATA: lt_tables       TYPE TABLE OF dd02t,
      lv_table        TYPE dd02t,
      lt_dfies        TYPE TABLE OF dfies,
      lv_dfies        TYPE dfies,
      lv_fieldcomment TYPE string.
SELECT t~tabname t~ddtext into corresponding fields of table lt_tables FROM dd02t AS t
  INNER JOIN dd02l AS l ON l~tabname = t~tabname WHERE t~as4local = 'A' AND t~ddlanguage = sy-langu AND l~tabclass = 'TRANSP'.
LOOP AT lt_tables INTO lv_table.
      NEW cl_sql_statement( )->execute_query(
           ` COMMENT ON TABLE "` && lv_table-tabname && `" is '` && lv_table-ddtext && `' ` ).
          tabname        = lv_table-tabname
          langu          = sy-langu
          dfies_tab      = lt_dfies
          not_found      = 1
          internal_error = 2
          OTHERS         = 3.
      IF sy-subrc = 0.
        LOOP AT lt_dfies INTO lv_dfies.
              lv_fieldcomment = cl_abap_dyn_prg=>escape_quotes( CONV string( lv_dfies-fieldtext ) ).
              NEW cl_sql_statement( )->execute_query(
                         ` COMMENT ON COLUMN "` && lv_table-tabname && `"."` && lv_dfies-fieldname && `" IS '` && lv_fieldcomment && `' ` ).
            CATCH cx_sql_exception INTO DATA(oreffield).
              WRITE: / 'Error: ', oreffield->get_text( ).
               COMMIT WORK.
    CATCH cx_sql_exception INTO DATA(oref).
WRITE: / 'Table and field comments updated.'.

We have a Chinese version (SAP River ()SAP River概述) of this blog.


The most common architecture of SAP HANA application is like below(From SAP River Datasheet):


Figure 1: Traditional SAP HANA application architecture

   With the traditional architecture shown in Figure 1, application developer is responsible for both creating  data model in database level  and implementing control logic in XS level. SQL and SQL Script are required to create data model in SAP HANA database, while xsjs needed to implement control logic in XS level. Therefore, the developer of SAP HANA application at least needs to grasp two technologies to finish a SAP HANA application. Sometimes, developing a SAP HANA application needs the cooperation of two or more developers. SAP River is an option to void this problem.

   SAP River is a brand new method to develop SAP HANA application. SAP River consists of a programming language, a programming model and a suit of development tools. With help of SAP River, developer can concentrate on the design of business intent, ignoring how is it implemented and optimized in SAP HANA database.  SAP River only exists in design-time. When SAP River objects activated, all SAP River code is compiled into SQL Script or XSJS code and then transferred to XS engine indexserver for execution.


Figure 2: Function model of SAP River

The function model is shown in Figure 2 (From SAP River Datasheet). SAP River integrates all the segments involved in development of SAP HANA application, including data modeling, control logic and access control, which makes developer capable of accomplishing a SAP HANA application grasping only one single technology. Firstly, developer can design the data model for SAP HANA application with SAP River language. During compilation, SAP River will create corresponding database objects for the data model you designed using SAP River language. For example, the entity in SAP River program will be mapped to a table in SAP HANA database. Secondly, using SAP River language, developer can define methods for data objects, which implements business logic. Last but not least, SAP River provides developer a way to design the access control for data and method.

SAP River Language

      As a new generation of SAP HANA application development method, SAP River provides a strongly typed and declarative programing language. With this programming language, even a non-computer-major developer can also easily develop a SAP HANA application. In addition, SAP River also supports embedded SQL Script and XSJS code in SAP River program, which is useful for some complicated logic.

      SAP River Language mainly includes:

  1. Application: Application is the largest object in SAP River Language, all other objects must be included in an application. Objects in an application can be exposed to external applications via some ways, such as OData.
  2. Entity: Entity is used to store data in database. Usually, one entity is mapped to a table in SAP HANA Database. An entity consists of one or more elements. Usually, one element is mapped to a column of a table. Each element has its own type, the type here can be fundamental type such as integer, string, and can also be a custom type or entity.
  3. Types: Type defined the size and validity of data. SAP River supports different kinds of types, including fundamental type, structured type and stream type. Each entity will automatically define a corresponding implicit type, which has same name with the entity.
  4. Action: Action is similar to the function or method in other programming languages. Usually, SAP River define the business logic of the application in the actions.
  5. View: With the help of view, you can create a data stream using select statement, this data stream is a subset of target data set. The data of this data stream will be dynamically extracted from target data set when the data stream is used.
  6. Role: Role can be created in SAP River code, and assigned some privileges to this role. In this way, access control is implemented.


Figure 3. SAP River example program: Hello World

      As a programming language, SAP River also provides some libraries, which contains many common functions. These libraries can be divided to some categories according to their functionality:

  1. Math Functions: mathematic calculation functions, such as absolute, ceil, floor, etc.
  2. Integer Functions: process the data of integer type, such as toDecimalFloat, toString, etc.
  3. String Functions: deal with strings, such as length, substring, etc.
  4. Date Functions: date functions, such as day, month, etc.
  5. Session Functions: session functions, such as getUserName.
  6. Logging Functions: log function, such log.
  7. Utility Functions: some utility functions, such as parseDecimalFloat, parseInt, etc.
  8. HTTP Calls: Http library is used to send Http request in SAP River code, such as OData or REST services.

OData Calls

      SAP River can expose the data and business logic to client via OData protocol. SAP River can expose data to client either in application level or in namespace level via OData. If exposed in namespace level, then every entities, views, and actions in this namespace are exposed to client. If exposed in application level, then only the objects that are tagged for exposure are exposed to client. Here, let’s take application level as example to illustrate how to expose data via OData.


Figure 4. Expose SAP River application via OData

   ① when expose object in application level, you must use the keyword export to specify which objects are to be exposed to client. Such as Employee in TestSyntax here.

   ② There are more than one ways to expose SAP River object, OData is one of them. So it is necessary to add a notation “@OData” to declare the way of exposure. SAP River will create corresponding OData service for the application or namespace.

   ③ By Default, SAP River object is private. To let specified users to access certain object, you need to use the keyword “accessible by” to tell which role or privileges are required to access the object.  If every users are allowed to access, you can just use “accessible by sap.hana.All”.

How to learn SAP River

      SAP HANA begin to support SAP River from SPS07, we have many materials to study SAP River. Here are some materials for studying SAP River:

  1. SAP River Datasheet
  2. SAP River Help Docs
  3. Videos and PPTs introducing SAP River
  4. Series of videos introducing SAP River in SAP Academy


        In this blog, we talked about functionality of SAP River, advantage and structure, including the advantages of SAP River over traditional development framework, main features of SAP River and how to expose data to client via OData.


  1. SAP River Datasheet
  2. SAP River Language Reference
  3. SAP River Developer Guide

We have a Chinese version(SAP HANA Smart Data Access(三)——如何利用SDA通过Hive访问Hadoop数据) of this blog.


      In previous blog of this series, we talked about how to install and configure the data source of SDA in SAP HANA Server side. As most data sources supported by SAP HANA are all database, the procedure of installation and configuration is similar. But for Hadoop data source, something is different. As a distributed data processing platform, Hadoop usually store data in HDFS file system, or in NoSQL database HBase which is also usually based on HDFS. However, both of HDFS and HBase don’t support ODBC protocol. So we need another member of Hadoop family  to solve this problem, it is Hive. Hive implements SQL interface for HDFS and HBase, and HiveODBC driver is also provided. In this blog, we’ll talk about how does SDA access the Hadoop data through Hive.

Deploy Hadoop and Hive

      The Official version of Hadoop supported by SAP HANA SDA is “Intel Distribution for Apache Hadoop version 2.3” (Including Apache Hadoop version 1.0.3 and Apache Hive 0.9.0). Although there’s only one version in the official supported version list, the experiment of this blog shows that SDA can also access the data stored in ordinary Apache version of Hadoop. The experiment of this blog build up a Hadoop cluster containing 3 nodes, and the version of Hadoop and Hive is : Apache Hadoop 1.1.1 and Apache Hive 0.12.0.

      As the guide of deploying Hadoop and Hive can be easily found in internet, we don’t discuss it here. After deploying Hadoop and Hive, some data for experiment needs to be prepared. Here, We use a user information table, the structure of the table is :

Column Name

Data Type











      Data can be imported from csv file into hive table. Firstly, create a table using hive shell.

create table users(USERID string, GENDER string, AGE int, PROFESSION string, SALARY int)

row format delimited

fields terminated by '\t';

      Then, import data from csv file to the users table:

load data local inpath '/input/file/path'

overwrite into table users;

Here, the data is imported from local file system, Hive can also import data from HDFS. In this experiment, the number of records in users table is 1,000,000. After importing, count the record number:


As shown in the picture above, Hive call the MapReduce to query data, and it takes 14.6 seconds to count the record number of users table. Afterwards, select out the top 10 records of the table:


As we see, it takes 0.1 second.

Installing and configuring HiveODBC Driver

      Same as installing driver for other data sources, installing HiveODBC driver also requires unixODBC installed in the SAP HANA Server side. HiveODBC requires unixODBC-2.3.1 or newer version. For more details about installing unixODBX, please see the reference [2].

      unixODBC installed, Begin to install HiveODBC driver. As introduced in reference [2], we use the HiveODBC provided by Simba Technologies. The procedure of installing is like below:

  1. Download Simba HiveODBC driver package, and decompress the package to a certain directory. Then enter the directory: /<DRIVER_INSTALL_DIR>/samba/hiveodbc/lib/64 (use 32 to replace 64 if it’s for 32-bit system) to check the driver file
  2. Login SAP HANA Server as sidadm.
  3. Execute “HDB stop” to stop the SAP HANA.
  4. Copy the file “/<DRIVER_INSTALL_DIR>/simba/hiveodbc/Setup/simba.hiveodbc.ini” to the home directory of sidadm.
  5. Edit the ~/.simba.hiveodbc.ini with vim.
  6. If there’s one row “DriverManagerEncoding=UTF-32”, change it to UTF-16.
  7. Check the ErrorMessagePath = /<DRIVER_INSTALL_DIR>/simba/hiveodbc/ErrorMessages, correct it if it doesn’t points to right path.
  8. Comment out the row:, and add a new row:
  9. Edit the .odbc.ini file in home directory of sidadm, add a new DSN for hive, the default port for hive is 10000, here’s an example:





   10. Edit the file $HOME/ to set some environment variable:


/< DRIVER_INSTALL_DIR>/simba/hiveodbc/lib/64/

export ODBCINI=$HOME/.odbc.ini

   11. Use isql to check whether SAP HANA Server can connect to remote data source successfully:

isql –v hive1

   12. If connect successfully, execute “HDB start” to start SAP HANA.

Create hive data source

      When installing and configuring HiveODBC finished, create the hive data source in SAP HANA Studio following the steps introduced in Reference [1]. Here, you need to choose the HIVEODBC as the adapter.

      After hive data source created, you can view the tables in hive, as shown in picture below:


Query Hive virtual table

      Add a new virtual table which maps to table users in Hive following the steps introduced in Reference [1]. Then count the record number of the virtual table:


As shown above, it takes 14.1 seconds to count the virtual table in SAP HANA Studio, which is close to the time it cost in Hive side. The result tells us that SAP HANA SDA doesn’t influence the performance of operation in remote data source when little data transmission involved.


      In this blog, we illustrate how SAP HANA SDA access the Hive table stored in Hadoop using a simple example. Hive is a tool which provide SQL interface for Hadoop. From the experiment result, querying the virtual table in SAP HANA Studio and querying the Hive table in Hive side is very close in performance when little data transmission involved.


  1. SAP HANA Smart Data Access1)——A brief introduction to SDA
  2. SAP HANA Smart Data Access2)——How to install and configure the data source driver of SDA
  3. Section 6.1.1 of SAP HANA Administrator Guide:

We have a Chinese version(SAP HANA Smart Data Access(二)——SDA数据源驱动的安装与配置) of this blog.


      In the blog “SAP HANA Smart Data Access (1):A brief introduction to SDA”, we introduced the architecture of SDA and talked about how to add remote data source for SDA in SAP HANA Studio. Before adding remote data source for SDA, it is necessary to finish the installation and configuration the ODBC manager and ODBC driver for SDA date source in SAP HANA server side. For different SDA data sources, the process of installing and configuring ODBC driver is similar. In this article, we take the Oracle data source as an example to talk about how to install and configuration ODBC driver.

Installation of unixODBC driver manager

      Since SAP HANA SDA communicates with remote data source using ODBC protocol, the ODBC driver manager must be installed in SAP HANA server side. Usually, the unixODBC is chosen as the driver manager for SAP HANA SDA. Software package of unixODBC can be downloaded from website: . Please note that the version of unixODBC needs to be 2.3.0 for SQL Server data source, and 2.3.1 or newer version is required for other data sources. The process of installing unixODBC is below:

  1. Download corresponding version of unixODBC package, the name of package is unixODBC-x.x.x.tar.gz, x here stands for version number.
  2. Login in the SAP HANA server as the root user, decompress the unixODBC package to specified folder.
  3. Enter the folder specified in step 2, then execute the commands below in order:



make install

   4. unixODBC should be installed by now, you can execute “isql --version” to check whether the unixODBC is installed successfully.

Installation of ODBC driver for data source

      So far, the data sources supported by SAP HANA SDA includes: SAP HANA, SAP Sybase ASE, Teradata database, Oracle, MS SQL Server and Hadoop. The ODBC driver for the database productions, such as Sybase ASE and Oracle, can be downloaded from the official website of the database. For example, you can download ODBC driver for Oracle form website: . As for Hadoop data source, SAP HANA SDA can communicate with it through Hive. More details about connection between SAP HANA SDA and Hadoop in subsequent blogs in this series. And the SAP official recommendation for Hive driver is the HiveODBC driver provided by Simba Technology. Simba HiveODBC can be achieved from the Simba website:

      After downloading the ODBC driver, the driver can be installed according to relevant installation guide. Take Oracle as an example here, two zip packages should be downloaded:, x stands for version number. Then decompress the two packages to a same folder with unzip command. The default unzipped directory is instantclient_xx_x. if everything goes well, you can find the ODBC driver file for Oracle in the unzipped directory, the file name is . By now, the ODBC driver for Oracle database is installed.

Configuration of ODBC data source

      ODBC configuration file needs to be created after installing ODBC data source, and some environment variable should be set accordingly. Let’s also take Oracle as an example:

  1. Login SAP HANA Server as sidadm, sid is the ID of SAP HANA instance.
  2. Enter the home directory of sidadm, create the ODBC configuration file named “.odbc.ini”.
  3. Edit .odbc.ini with vim, the content should be like below:






ORCL_DSN is the name of ODBC data source, which is used by ODBC Manager to find the information for connection. Driver is a keywords used to specify the path of ODBC Driver of data source. For oracle, ServerName is the name of Oracle database which is defined in the file called “tnsnames.ora” located in home directory of sidadm. For other kinds of database, “ServerName” is replaced by other keywords, such as “ServerNode” for SAP HANA, “Server” and “Database” for MS SQL Server.

    4. For Oracle data source, create a file called “tnsnames.ora” in the home directory of sidadm, then edit it with vim, the content should be like below:




                        (ADDRESS = (PROTOCOL = TCP)(HOST = <host_ip>)(PORT = 1521))

                        (CONNECT_DATA =

                                        (SERVICE = DEDICATED)

                                        (SERVICE_NAME =orcl)




    5. Set some environment variables in the shell script file “” of SAP HANA, this shell script file is located in the folder specified by environment variable “DIR_INSTANCE”. Add the commands below to this shell script file:


export LD_LIBRARY_PATH = $LD_LIBRARY_PATH:your_oracle_driver_dir/instantclient_12_1

export LD_LIBRARY_PATH = $LD_LIBRARY_PATH:/usr/local/lib

export ODBCINI = $HOME/.odbc.ini

export TNS_ADMIN=~/   (For Oracle only)


Here, please pay more attention to “LD_LIBRARY_PATH” variable, all the libraries depended by ODBC driver are searched orderly in the folder list specified in the “LD_LIBRARY_PATH” variable. If any folder which contains the depended library is not included in the folder list, ODBC Manager will say can’t find the library file.

     6. Restart SAP HANA, login SAP HANA Server as sidadm, and execute the command “isql –v <DSN> ” to test the connection to data source. If connection succeeded, the configuration is finished. If connection failed, please analyze and process the error according to error message. Some tips for error processing would be introduced in next section.

Trouble shooting

   1. Error message:

[08004][unixODBC][Oracle][ODBC][Ora]ORA-12154: TNS:could not resolve the connect identifier specified


This error is raised by ODBC Manager, it is easily misunderstood as an oracle error. However, this reason of this error is that the environment variable TNS_ADMIN is not set or set incorrectly. The TNS_NAME tells where is the file tnsnames.ora. So, if TNS_ADMIN is not set correctly, the ServerName specified in .odbc.ini can’t be resolved. Then the error is raised.


Set the environment variable TNS_ADMIN to the home directory of sidadm in the

   2. Error message:

[01000][unixODBC][Driver Manager]Can't open lib '/path/to/driver/' : file not found [ISQL]ERROR: Could not SQLConnect


The error message says the file can’t be found, but this file do exist. Let’s use command ldd to check the dependency of this file, we get:


We can see that the file is depended by, and it can’t be found. Although is in the same folder with, but the folder is not in the folder list specified by LD_LIBRARY_PATH, so it will not be searched.


      Add the folder which contains to LD_LIBRARY_PATH in the

Create remote data source

       After all the configuration finished, create a remote data source following the procedure introduced in “SAP HANA Smart Data Access(1)——A brief introduction to SDA”.  The tables in remote data source can be viewed through SAP HANA Studio after remote data source created, as shown below:



      In this blog, we take oracle as an example to illustrate how to install and configure the ODBC manager and ODBC driver of remote data source, and simply discuss some error which may happen during the procedure of installation and configuration. The installation and configuration of other data sources supported by SAP HANA SDA is similar, some little difference would be introduced in subsequent blogs of this series.


  1. SAP HANA Smart Data Access(1)——A brief introduction to SDA
  2. Section 6.1.1 of SAP HANA Administrator Guide

We have a Chinese version (”SAP HANA Smart Data Access(一)——初识SDA”) of this blog.


      In the application scenarios of SAP HANA, it is common to analyze and process the data located in other systems. Usually, customers would like to replicate the data from other systems to SAP HANA, and then do analysis and processing in SAP HANA. However, data replication not only costs time and memory, but also, usually, requires another replication system deployed which is always not easy. SDA, abbreviation for Smart Data Access, provides customers a new way to access the date in remote data source.

What is SDA

SDA is a new method of SAP HANA for accessing the data stored in remote data sources. With the help of SDA, SAP HANA can create  so-called “virtual table” mapping to tables located in remote data sources, and then SAP HANA can access the data directly by accessing the “virtual table”. “virtual table” can be manipulated by SAP HANA just like an ordinary table, which means the operations, such as select, update, insert, delete, and so on, are all available for “virtual table”. Besides, join operation between local table and “virtual table”  is supported. When such join operation taken, optimizer of SAP HANA sends the relevant operations to remote data source for processing, and then the result set would be sent back to SAP HANA for further processing.

SDA was introduced in SAP HANA SPS06. At that time, the data sources supported by SAP HANA SDA includes: SAP HANASybase ASESAP Sybase IQTeradata database and Apache Hadoop. And only read operation was permitted for virtual table. In SAP HANA SPS07, the data sources and operations supported by SAP HANA SDA are both extended. MSSQL Sever and Oracle are added to the list of supported data source, and write operation is permitted. The comparison of SDA in SPS06 and SPS07 is as below:




Supported data sources

SAP  HANASAP  Sybase ASE 15.7 ESD#4SAP Sybase  IQ version 15.4 ESD#3 and 16.0

Teradata  database version 13.0Intel Distribution for Apache Hadoop version 2.3

All data sources supported in SPS06Oracle Database 12cMicrosoft SQL Server version 11 for SQL Server 2012

Supported operations for virtual table



Notethe data sources officially supported by SAP HANA are limited to specific versions above, other versions are not guaranteed to work well.

Creating Data Source

     The first step of accessing remote data source is to create remote data source in SAP HANA. The communication between SAP HANA and remote data source is based on ODBC protocol. The subsequent blogs of this series will talk about how to deploy remote data source in SAP HANA server side. Here, let’s simply talk about how to create remote data source in SAP HANA Studio.

      In SAP HANA Studio, there are two ways to create remote data sources, one is by GUI, another is using SQL statement.

(1) Create remote data source with GUI:


Firstly, Open the folder called “Provisioning”. And then right click the “Remote Sources” with mouse, select “New Remote Source…”:


Secondly, choose one adapter from the adapter list in the popup dialog, and fill in corresponding connection and authentication information of the remote data source.

Lastly, press the run button 5.png to create data source.

(2) Create remote data source with SQL:


ADAPTER <adapter_name> [CONFIGURATION FILE 'filename']

CONFIGURATION <connection_info_string>




ADAPTER "odbc"

CONFIGURATION FILE ‘property_orcl.ini’



USING ‘user=OUTLN;password=Aa111111';

In above SQL statement, <adapter_name> can be one of:  ASEODBC, IQODBC,TDODBC, HIVEODBC,ODBC. Obviously, ASEODBC is for Sybase ASE as data source, IQODBC is for Sybase IQ, TDODBC is for Teradata Database, HIVEODBC is for Hadoop. And ODBC adapter is for other common data sources. <connection_info_string> is used to specify the connection information for data source, the name of DSN is usually  given here. <opt_credentials_clause> is used to specify the authentication information of data source. Attention please, only adapter ODBC requires the CONFGURATION FILE, the functionality of configuration file will be introduced in next section.

Generic Adapter Framework

With the help of SDA, SAP HANA can communicate with the data sources who supports the ODBC protocol. However, just as discussed above, the supported data source of SAP HANA SDA is still limited now. For the supported specialized data source, SAP HANA has provided native code to support their operations. But SAP HANA SDA can’t guarantee other data sources to work well. What stop SAP HANA SDA from supporting more ODBC data sources is that some operation and configuration of these ODBC data sources can’t be processed by standard ODBC interface. For example, prepare a transaction for Sybase ASE requires some additional code which is not included in standard ODBC protocol. As SAP HANA provides such code, the operation for Sybase ASE is supported.

In order to decrease the influence of this issue, SAP HANA SDA applies Generic Adapter Framework to implement the communication with those unsupported ODBC data sources instead of calling the specialized native code for that data source in SAP HANA. With the help of Generic Adapter Framework, you can customize the feature and action of data source by setting a configuration file. For example, you can specify the supported operations, function mapping, data type mapping of the data source in the configuration file. For convenience of illustration, we call the configuration file “Property Configuration File” in the rest of this blog.

When creating data source, SAP HANA SDA will use the Generic Adapter Framework to communicate with remote data source if the ODBC adapter is chosen. SAP HANA SDA will search the property configuration file in the folder specified by environment variable DIR_EXECUTABLE, and the file name is specified by CONFIGURATION FILE option. By SPS07, SAP HANA SDA has provided the template of property configuration file for MSSQL and Oracle. They are called property_mss.ini and property_orcl.ini, and they are both located in the folder: $DIR_EXECUTABLE/config.

After data source created, SAP HANA SDA will parse the relevant property configuration file, all the features, function mappings, data type mappings and other properties will be linked together with the data source, and influence the communication between the SAP HANA and the data source.

  A part of content of property_orcl.ini is below, we can figure out some format and function of property configuration file:


Typical process of creating data source

      Creating a remote data source in SAP HANA usually involves steps below

  1. Check whether SAP HANA provides specialized adapter for the data source, such as "ASEODBC","IQODBC","TDODBC";
  2. If specialized adapter is available, then just use it to create data source;
  3. If specialized adapter is not available, then check whether there is a specialized property configuration template file, such as template for Oracle, MSSQL;
  4. If specialized property configuration template exists, you can change the property configuration file according to your requirement, and then create data source using the modified file. For example, as long as the correctness of the modification ensured, you can disable the unnecessary functions, modify the mapping of data type or function based on you requirement.
  5. If specialized property configuration template exists, you have to create a brand new property configuration file from scratch. To create such a file, you must be familiar with the properties of the data source and the driver it use;
  6. Create the data source in SAP HANA Studio using the specialized adapter or the common adapter (ie. ODBC adapter). When using common adapter, you need to specify the property configuration file for the data source.

Note: when modifying or creating the property configuration file, only the property which is different from default value needs to be set. The mistake in property configuration file may result in incorrect action or result of data source.

Creating virtual table

                After data source created in SAP HANA Studio, “virtual table” mapping to data in remote data source can now be created in SAP HANA Studio. Similar to creating data source, there are also two ways to create virtual table:

(1) Create virtual table with GUI:


(2) Create virtual table with SQL Statement below:

   CREATE VIRTUAL TABLE <table_name> <remote_location_clause>




    In this blog, we have talked about  basic content of SAP HANA SDA. In the subsequent blogs of this series, we will further talk about how to deploy SDA data source in SAP HANA Server, how to access Hadoop with the help of SAP HANA SDA, and so on. Please pay attention.


1. What’s New SAP HANA Platform Release Notes for SPS07:

2. Section 6.1.1 of SAP HANA Administrator Guide



  • SAP LT is trigger based ,Real time replication method.
  • SAP LT replication server for SAP HANA leverage on proven SLO(System Landscape Optiomization) technology.SAP LT replication server is the ideal solution for all HANA customer who need real-time or schedule based replication sourcing from SAP and NON-SAP sources.
  • SLT replication server can be installed as stand alone server or can run on any SAP system with SAP NetWeaver 7.02 ABAP Stack(Kernel 7.02EXT) .
  • Add-on DMIS_2010_1_700 with SP5-7, NW 7.02; SAP Kernel 7.20EXT need to be installed for SLT replication to apply SPS04 for SAP HANA 1.0.
  • SLT replication system is connected with Source SAP ECC system by RFC(Remote Function call) connection and with target HANA system by DB Connection.In case of NON-SAP source system SLT connect with DB connection.
  • SAP source system contains Application tables,logging tables,triggers and read module.SLT server contains Control Module(Structure mapping & Transformation) and Writer module.In case of Non-SAP source system, Read Module resides in SLT System.
  • When replication start,logging table and trigger created in source system and data gets replicated to target system via SLT replication server and any changes in source system gets automatically replicated to target system.




Configuration steps for SAP LT Replication Server

  • Define a schema for each source system
  • Define connection to source system
  • Define DB connection into SAP HANA
  • Define replication frequency (real-time; frequency for scheduled replication)
  • Define maximum number of background jobs for data replication


STEP 1: Open Configuration and Monitoring Dashboard by using transaction code “LTR”. Click on New button to start SLT configuration.


STEP 2: Provide name of SLT configuration. Please note that a schema will be created with the same name in HANA after the completion of configuration settings.


STEP 3: Specify source system details. If the source system in SAP (ECC) then select RFC Connection. Select DB Connection for Non-SAP Source system.


STEP 4: Specify target system details. The target HANA system is connected through DB Connection. Provide HANA system connection parameters. Administrator user privilege required.


When the popup to create a new configuration is closed by pressing the OK button, the following actions are performed automatically:


  • Configuration settings are saved on the LT Replication Server
  • New user and schema are created on the HANA system with the defined target schema name (not performed if an existing schema is reused).
  • Replication control tables (RS_* tables) are created in target schema.


  • User roles for the target schema are created:
    • <target_schema>_DATA_PROV -> Role to manage data provisioning
    • <target_schema>_POWER_USER -> Contains all SQL privileges of the target schema
    • <target_schema>_USER_ADMIN -> Role to execute authority
  • A procedure to grant (RS_GRANT_ACCESS) or revoke (RS_REVOKE_ACCESS) are created in the target schema
  • Replication of tables DD02L (stores the table list), DD02T (stores the table short descriptions) and DD08L (R/3 DD: relationship definitions) is started automatically. Once those tables are replicated, the HANA studio knows which tables are available in the source system.
  • SYS_REPL and table RS_REPLICATION_COMPONENTS are created (If they don’t exist already based on a previous configuration)
  • Replication is registered in table RS_REPLICATION_COMPONENTS




Use LTRC transaction code to open LT replication Server Cockpit.We can use SLT options from SLT replication server as well as from  HANA data provisioning.Select table and replication option from LT Replication server to start/replicate/stop/suspend/resume replication.


HANA data provisioning option can be seen in Quick launch view.


Select table to start replication.


This may be useful for any Mac users who are developing or administering SAP HANA: as you may noticed long time ago Mac version of SAP HANA Studio was available for download here: . Unfortunately, recentlyMac version disappeared and only versions available for download are Windows and Linux versions.


Don't worry.


You can get most recent version of HANA Studio running on your MacOS easily.


Follow next steps to install SAP HANA Studio on your Mac:


  1. Download Eclipse Kepler from
  2. Unzip and move to Applications folder
  3. Start Eclipse
  4. Help -> Install New Software ...
  5. Add... to add repository, use this URL:
  6. Use this repository to see available software list
  7. Pick SAP HANA Tools (or more, depending on your needs)
  8. Finish installation (you will be asked to restart Eclipse)
  9. After restart, switch to HANA perspective and you are ready to start!


How to install HANA studio on Mac - Eclipse Update Site.png

How to install HANA studio on Mac - Eclipse Update Site 2.png


HANA Studio - Switch to HANA Perspective - Eclipse Kepler.png

Hi all,


This is the second post of a series that talks about the BlogProject application development with HANA XS. If you missed the first one make sure to have a look:


Given the fact that I will develop the application in parallel with the posts, I will update them when and if new requirements arise. If I miss anything, forgive me if there are any inconsistencies regarding what I am proposing and what I will actually do.


  • Application schema


The above schema represents the data model of the application. Some things that might by confusing are the “subject”, “latitude” and “longitude” columns and the POST2POST table. Let’s start with the columns. The “subject” column of the POST table is going to hold the subject of a post, which will be the result of the use of the text analysis capabilities of SAP HANA. The “latitude” and “longitude” columns will help us with the geospatial statistics. Lastly, the POST2POST table will save all the link action between the posts, storing, for each link, the post that includes the link and the post that is referenced.


  • Tables

Column or row?

In the most cases the tables we create in HANA are columnar, but you can use row tables of course, depending on your application’s needs. Each table type benefits specific processes.


Row stores:

a) Easier to insert and update

b) Better if you need the complete row, because the reconstructing is one of the most expensive column store operations

c) Preferred if the table has a small number of rows (e. g. configuration tables). 

d) If your application needs to only process a single record at one time

Column stores:

a) Only affected columns have to be read during the selection process of a query.

b) All the columns can serve as an index

c) Facilitates higher compression rates

d) Better if the table holds huge amounts of data that should be aggregated and analyzed

e) Better if the table has a large number of columns. 

f) Preferred if calculations are typically executed on single or a few columns only. 

To sum up, the basic difference is the type of processes for which we use each table type. In OLAP processes it is better to use column stores because for analysis we query certain columns and column stores provide a much better compression, thus minimizing the querying time. On the other hand, row stores are better for OLTP processes facilitating fast inserts and updates.


Notes and suggestions:

To enable fast on-the-fly aggregations, ad-hoc reporting, and to benefit from compression mechanisms on transactional data it is recommended to store them in a column-based table.

If you need to join tables avoid using different storage types, since using both storage engines reduces performance.

Attribute, Analytic and Calculation Views are only supported on columnar tables.

Enabling search is only possible on column tables.


How to create

There are two ways to create a table:


Via repository

We can create a table via the repository by creating a new file in the “data” folder, with the extension .hdbtable. (If we choose to create a new “Database Table” and not a “File”, then the repository understands the file extension and we don’t need to add it. This applies to all file types). Then all we have to do is execute an SQL CREATE statement.


Via catalog

To create a table, simply create a “New Table” in the “tables” folder of the schema and then add the columns, types etc. in a graphical manner. If we create our tables that way, we will not be able to add a foreign key to the table. To do so we will have to add it with a simple SQL statement.


For our application I have created the 5 tables below via the “Systems” view (Catalog):







  • Sequences

HANA does not support the “autoincrement” of a column (e.g. id), so we need to create a sequence to do that, which also provides additional capabilities.


Sequence variables

A sequence definition includes the below variables, of which only the schema is compulsory, some have default values and the rest are optional:


schema = "";

increment_by = integer; //the incrementation value (default = 1)

start_with = integer; //the first value of the sequence (default = -1)

maxvalue = integer; //the maximum value of the sequence

nomaxvalue = boolean; //if the sequence has a max value or not (default = false)

minvalue = integer; // the minimum value of the sequence

nominvalue = boolean; // if the sequence has a min value or not (default = false)

cycles = boolean;//if the sequence starts with the minvalue after the maxvalue has been reached, or the opposite

reset_by = ""; //the query that will be used on server reboot to find the value that the sequence will start with

public = boolean; //(default = false)

depends_on_table = ""; //the dependency to a specific table

depends_on_view = ""; // the dependency to a specific view


How to create

Via repository

Create a new file inside the “data” folder of the repository with the extension .hdbsequence. In our application, we just want the ID columns to increment by 1 so I used a few variables. For example, for the USER table:


schema= "BlogProject";

start_with= 1;

depends_on_table= "BlogProject::USER";


Via catalog

Create a “New Sequence” in the “Sequences” folder of the schema and add the values of the sequence in a graphical manner as shown below:


For the BlogProject I created the below sequences. The POST2POST table does not need a sequence because it does not have an ID column, but two columns that are foreign keys referencing the POST table’s ID.



Note: A single sequence can be used for all tables, but its value is incrementing regardless of the table. For example if we get the value 5 from a call of the sequence and then call it for another table, we get the value 6 and not the next value we were probably expecting for the specific column of the table. If we want a single incremental value for each table, then we must create different sequences, one for each.


Calling a sequence

Sequences are not associated with tables, but can only be restricted to apply to certain tables (with the “depends_on_table” variable). In fact, they are used by applications through SQL statements, which can use CURRVAL to get the current value of the sequence, or NEXTVAL to get the next value.


For example if we want to insert a new user to our USER table we execute this statement:


insert into "BlogProject"."USER"

values ("BlogProject"."".NEXTVAL, 'user', 'user', 'user', 1)


  • Accessing database objects

When we access a table from a file in the repository we write “schema::table_name”, and if we call it from the SQL console we write “schema”.”table_name”. The same rule applies to all the database objects.






And now we came to the end of our persistence model creation. In the next post we will talk about the modeling of our data using information views and other data-centric application logic enablers (triggers, functions and procedures).

Find the 3rd part here:
Introduction to HANA XS application development (Part 3): BlogProject data modeling with information views


Thanks for reading!

Hi all,


This is the first post of a series that will engage with the development, from start to end, of a simple application on the SAP HANA platform, accompanied of course by the underlying theory. To offer a spherical presentation of the platform, I tried to use a variety of HANA capabilities, but of course some things will be missing. Further information can be found in the developer guide.


  • Application

  To begin with, let’s see what our application is about. The application that I will present here is called BlogProject and I think that what it will do is pretty obvious considering the name, but I am going to describe it anyway.  So, in the app a user will be able to login/register, search for and view a post, write one or write a comment on a certain post.  Simple as that. I may extend the application, adding a user profile, for a user to be able to see his/her own posts and comments, or some kind of statistics page, for example a geographical distribution of the posts.


  • Prerequisites

1.   You have access to an SAP HANA system. If you don’t have, then explore your options and choose the right one for


2.   Add your HANA system to your HANA studio


  • Studio explanation

  Let’s take a glimpse to the HANA studio. The studio includes different perspectives, each giving a different view.



Project Explorer: here we can see all our projects and files included in each.

Repositories: here we see all your local workspaces

Systems: in this view we can add and manage our HANA systems and database objects


  • Repository Workspace

First, we have to create a workspace for our project. In the “Repositories” view right click -> New Repository Workspace. In the window choose your user (SYSTEM in this case). Then give a name to your workspace. In our case “workspace1” (don’t ask why…) and a location where to save it.


  • Project

  After the workspace creation we have to create our project. To do so we have to go to the “Project Explorer”, right click and then “New” -> “Project”. Then find and choose XS Project (SAP HANA -> Application Development).


After the project is created, it is a good practice to create the following folders, in order for our project to be organized: data, model, procedures, services and ui. We will see later what files we create in each folder.


Share and activate the project: To be able to activate our project we have to share it first, adding it to our workspace. To share it follow this procedure: right click on your project-> “Team” -> “Share Project” -> Choose “SAP HANA Repository” -> Choose (or add) your Workspace and click “Finish”.


We must always activate our project when we make any changes, for them to be committed and shared. To do so right click on your project or the file(s) you want to activate -> “Team” -> ”Activate”.


.xsapp .xsaccess files: These files are necessary for exposing any content via the XSEngine web server. Create the three files but without a name, just the extension. The .xsapp file does not contain any content. In the .xsaccess file paste the following:





  • Schema

For our application to be able to create and access database objects we have to create a schema, which will hold all the database objects we will create, such as tables, SQL views and stored procedures. By default we are granted a schema called “SYSTEM”, but it is a better practice to create a separate schema for our application, so that your database is more organized.


The procedure is very easy. We just have to create in the “data” folder of our project a new file with the extension .hdbschema. The definition of the schema is the following




Or we can open the SQL console from the “Systems” view and execute the following create statement:




  • Authorization model

Authorization is about providing access to specific resources only to certain users. The basic entity of the authorization model is a privilege. Privileges can be granted to a role, which is a group of privileges, or a user. Additionally, a user can be granted certain roles. The best practice it to assign privileges to roles and roles to users. That way the authorization model will be more organized.


First, let’s create a privilege to our application. We have to create an .xsprivileges file without a name. In the .xsprivileges paste the following:





"description":"Basic usage privilege"




"description":"Administration privilege"

} ]


Now that we have created the privileges to our application, we must grant them to a role so that we can have access. To create a role we just create a new file with the extension .hdbrole. Inside the file we type the following definition.


role MyBlogProject::model_access {

       application privilege: MyBlogProject::Admin;



Now our role has Admin rights upon our application. Then to assign the role to our user “SYSTEM” we have to go to the “Systems” view -> “Security” -> “Users” -> SYSTEM -> “Granted Roles”  and then add the role MyBlogProject::model_access we created.


Next for our user and application to be able to access and write on the schema we created, we have to go to again to the security tab -> “roles” -> MyBlogProject::model_access -> object privileges and add our schema  “BlogProject. Now that we added this privilege to our role, all the users who have the certain role will also have the same privileges to the schema.


As you may have noticed I granted our role certain privileges twice, the first time via the .hdbrole file in the repository and the second via the “Systems” view. We can edit the authentication model using both ways.




This concludes this first post of the series. Next we will see how to create our data definition (aka persistence model) beyond the creation of the schema that was illustrated above, creating tables and sequences:

I would like to share my SAP HANA certification exam experience and some of the TIPS for the preparation of SAP-HANA certification.


First I would like to brief about myself  , well, I started my carrier as programmer and then got interest in database development.I am NON-SAP professional,having hand on experience in MSSQL,MSSAS. I then started reading about In-Memory database technology and quite impressed with SAP-HANA technology and platform as In-memory database which is dual in nature (OLTP as well as OLAP).I generated huge interest to learn the new innovation technology HANA. I started my learning journey of SAP HANA in the starting of year 2014 and today on 22nd May 2014 I became SAP CERTIFIED APPLICATION ASSOCIATE-SAP HANA by qualifying the certification exam and scored 97%.


My MSSQL knowledge help me a lot to understand the SQLScript,Procedure,database development part of SAP HANA.

My MSSSAS knowledge help me to understand the DATA MODELING concept of SAP HANA.

My MSSIS knowledge help me to understand the ETL technology of SAP(BO-DATA SERVICES) of SAP HANA.




  1. Fist step to know the exam course content and topic areas as described in SAP training site-


          Data Provisioning                                                                                                                                > 12%

          Describe possible scenarios and tools for replicating and loading data into SAP HANA from different data sources (e.g. SAP Landscape Transformation (SLT), SAP Data Services, or Direct Extractor Connection (DXC)).

Security and Authorization                                                                                                               8% - 12%

Describe the concept of authorization in SAP HANA, and implement a security model using analytic privileges, SQL privileges, pre-defined roles and schemas. Perform basic security and authorization troubleshooting.

Data modeling - Analytical views                                                                                                    8% - 12%

Implement data models with SAP HANA using analytical views. Advise on the modeling approach and best practices.

Data modeling - Calculation views                                                                                                  8% - 12%

Implement data models with SAP HANA using calculation views, and advise on the modeling approach and best practices.

Advanced data modeling                                                                                                                  8% - 12%

Apply advanced data modeling techniques, including currency conversion, variables and input parameters. Implement decision automation using business rules.

Optimizion of data models and reporting                                                                                       8% - 12%

Monitor, investigate and optimize data models and reporting performance on SAP HANA. Advise on modeling approach and tools to achieve optimum performance. Evaluate the impact of different implementation options such as table joins, aggregation, or filters. Understand the implication on performance of the various reporting tools and connetion types.

Administration of data models                                                                                                         8% - 12%

Administer data models in SAP HANA, including setting of information validation rules, managing schemas, the importing/exporting and transporting of data models.

Reporting                                                                                                                                              < 8%

Provide advice on reporting strategies, and perform appropriate reporting solutions with SAP HANA. Build reports using various tools, for example, Microsoft Excel or SAP BusinessObjects BI tools.

Data modeling - SQL Script                                                                                                               < 8%

Apply SQL Script to enhance the data models in SAP HANA using AFL, CE functions, and ANSI-SQL.

Data modeling - Attribute views                                                                                                       < 8%

Implement data models with SAP HANA using attribute views, and advise on the modeling approach and best practices.

Deployment scenarios of SAP HANA                                                                                             < 8%

Describe the deployment scenarios for SAP HANA and evaluate appropriate system configurations.

SAP HANA Live & Rapid Deployment Solutions for SAP HANA                                                 < 8%

Describe the value of HANA and identify scenarios for SAP delivered content for SAP HANA, such as SAP HANA Live and Rapids Deplyment Solutions.


   2. Analyse which topic area covers more proportionate of the course content.As stated above DATA PROVISIONING and DATA MODELING covers almost 70% of the course content.If we keep hold on this then 70% of exam topic covers.


   3. Now we know the focus area of exam,start collecting and reading content of above topics.There are complete resources available in SAP site which gives broad level of understanding of SAP HANA DATA PROVISIONING,MODELING and other topic area.      


   4. SAP HANA Academy - YouTube video tutorials is huge library to gain further knowledge of above topics.     


   5. openSAP course materials and videos will help a lot to get hang on new innovation technology of SAP HANA.


   6. Refer my blog post on DATA MODELING.


          Data Modeling in SAP HANA with sample eFashion Database-Part I

          Data Modeling in SAP HANA with sample eFashion Database-Part II



   7. Start reading content and video tutorials topic by topic,let say first start DATA PROVISIONING and read all content and watch all videos tutorials on this topic.Note down the key concept,methods and flow of topic.Keep focusing on graphical representation of content area which will help to memorize the key        concept easily.


   8. Practical experience is quite important as most of the question are simply based on HANA STUDIO UI.SAP provides 30 days free access to HANA on Amazon cloud services.Please refer SAP HANA One on AWS Marketplace




  1. All question shall be multiple choice.most of the questions shall be single selection.However there are questions which contains more than one answer too(multiple select).
  2. No negative marking for wrong answer.If question contains more than one wright answer then all right option must be selected to make the question rightly answered otherwise it will be treated as wrong answer even if one option selected rightly.
  3. All questions has been grouped based on the topic described above.For example reporting section will contain 4 question and grouped by heading Reporting and all 4 question will be in sequential order then question form next topic and so on.
  4. DATA PROVISION question patterns like,how different provisioning tools connect to HANA,like one of the question asked is which connection involves in DXC. Type of replication like which tools support real-time replication,which tools support ETL,etc.
  5. MODELING topic question patters like,function of different views(Attribute,Analytic & calculation),variable,Input parameters,Hierarchy, different Join,CE function,SQLScript and procedure etc
  6. Reporting section generaly covers how SAP HANA talk to different reporting tool.Mainely the connectivity of SAP HANA to reporting tools.Like how Crystal Report,WEBI,DASHBOARD,EXPLORER,ANALYSIS EDITION FOR OLAP,ANALYSIS FOR MS OFFICE,MS EXCEL etc. can be connected to HANA.
  7. Different type of Privileges(Object,System,Pakage,SQL),which privilege does what authorization rule in HANA,Users and role assignment etc.


Hope this blog will help for those who set their mind to go for HANA certification.


Good Luck.


Mohammad Shafiullah

To set the context lets do things in HANA TA without the CGUL rules first.


1. Lets create a small table with texts


So lets create a Table which looks something like this:



Now lets create two texts in it:

insert into "S_JTRND"."TA_TEST" values(1,'EN','TO BE','');

insert into "S_JTRND"."TA_TEST" values(2,'EN','NOT TO BE','');


So now the table entries look like:



2. Text Analysis Via Dictionary

Now lets say we want to do text Analysis where we Say

  1. if the text is "TO BE" it is to be treated as POSITIVE_CONTEXT
  2. if the text is "NOT TO BE" it is to be treated as NEGATIVE_CONTEXT


Lets create a dictionary to have these two values:

So in XSJS Project we create a english-Contextdict.hdbtextdict and content will be as follows(also attached):

<dictionary xmlns="">

  <entity_category name="POSITIVE_CONTEXT">

    <entity_name standard_form="TO BE">

      <variant name="TO BE" />



  <entity_category name="NEGATIVE_CONTEXT">

    <entity_name standard_form="NOT TO BE">

      <variant name="NOT TO BE" />







Now we use the dictionary above to create a configuration file(also attached):

So, pick content from any .hdbtextconfig and add the path to the above dictionary in it:

  <configuration name="SAP.TextAnalysis.DocumentAnalysis.Extraction.ExtractionAnalyzer.TF" based-on="CommonSettings">

  <property name="Dictionaries" type="string-list">






3. Create Full text index on the Table using this configuration













Check the TA results:



Note* for NOT TO BE, we did not get both POSTIVE(for substring TO BE) AND NEGATIVE, altough this is good, its a fluke, as TA did take the longest string matching and hence for NOT TO BE, and its sub String TO  BE we got a Negative, but this could create problems.


Now moving on, lets add more to this context, lets add text NOT-TO BE as also a possibility of NEGATIVE_CONTEXT, infact NOT, followed by, TO BE,in same sentence is to be a NEGATIVE_CONTEXT.


Without changing anything lets insert some more values and see how they look:

insert into "S_JTRND"."TA_TEST" values(3,'EN','NOT-TO BE','');

insert into "S_JTRND"."TA_TEST" values(4,'EN','NOT, TO BE','');

insert into "S_JTRND"."TA_TEST" values(5,'EN','NOT, Negates TO BE','');


Check the TA results:



So you see we now have a problem, Also we could have NOT, -, NEG etc as possible predecessors before TO BE to point that its a NEGATIVE_CONTEXT


Solution 1: Lets have synonyms of NOT as one category, TO BE as "CONTEXT" category, and in post processing of TA lets see if we have TA_TYPE value of CONTEXT and NEGATIVE in same sentence then its a NEGATIVE CONTEXT,


But wouldnt it be great if index could do this on its own?


CGUL Rules save the day:


So here we go:


4. CREATE A .rul file

CONTEXT.rul(also attached) containing following rule:

#group NEGATIVE_CONTEXT (scope="sentence") : { <NOT> <>*? <TO> <>*? <BE> }


We need to compile this rule to get a .fsm file and put it on server under ...lexicon/lang (oos for this blog, I have attached the complied file here)


Now enhance you configuration file with reference to this fsm file.


<configuration name="SAP.TextAnalysis.DocumentAnalysis.Extraction.ExtractionAnalyzer.TF" based-on="CommonSettings">

  <property name="Dictionaries" type="string-list">




  <property name="ExtractionRules" type="string-list">







5. Restart the indexserver process so that the newly compiled rule file is picked up by the system.




6. Recreate the index using the same statement as above and check the TA table:




So, as you see the highlighted values come from the rule and mark extracted NEGATIVE CONTEXT, below I kept the dictionary value which wrongly identified the POSITIVE_CONTEXT for comparison, this should ideally not be handled by dictionaries.


So, in this context: To Be or Not To Be: HANA Text Analysis CGUL rules indeed has the answer!!


Hope this helps,

Bricks and Bats are Welcome

  • Hierarchies
  • Restricted & calculated measures
  • Input Parameters
  • Currency conversion
  • Filter operations and variables


HIERARCHIES: Hierarchies are used to structure and define the relationships among attributes of attribute views used for business analysis.HANA supports two types of hierarchy.

  • Level Hierarchies are hierarchies that are rigid in nature, where the root and the child nodes can be accessed only in the defined order. For example, organizational structures, and so on.
  • Parent/Child Hierarchies are value hierarchies, that is,hierarchies derived from the value of a node. For example, a Bill of Materials(BOM) contains Assembly and Part hierarchies, and an Employee Master record contains Employee and Manager data. The hierarchy can be explored based on a selected parent; there are also cases where the child can be a parent.

This discussion will helps us to create a Level Hierarchy or a Parent Child Hierarchy in order to structure and define relationship between view attributes.

  • In the Hierarchy Type dropdown, select the required option as follows:
    • Level Hierarchy
    • Parent Child Hierarchy
  • In the Node tab page, perform the following based on your selection of hierarchy type:
    • For a Level Hierarchy you can specify the Node Style that determines the unique node ID composition. Also, add various levels and assign attributes to each of them with the Level Type that specifies the formatting instructions for the level attributes.  Specify Order By to control the hierarchy members ordering, and Sort Direction to sort the hierarchy members display in ascending or descending order.
    • For a Parent Child Hierarchy specify the parent and the child attribute. Also, in the Step Parent node specify where to place the orphan parent-child pair.
  • In the Advanced tab page, specify the other properties of the hierarchy which are common to both hierarchy types as follows:

You can set Aggregate All Nodes to true if there is a value posted on the aggregate node and you want to compute that value while aggregating data.

  • Specify the default member
  • You can select Add a Root Node if you want to create a root node if the hierarchy does not have any
  • Specify how to handle orphan nodes using Orphan Nodes dropdown
  • Select Multiple Parent if the hierarchy needs to support multiple parents for its members


  1. Select the Semantics node.
  2. In the Hierarchies panel, choose Create option.
  3. Enter a name and description for the hierarchy.
  4. In the Hierarchy Type dropdown, select Level Hierarchy.
  5. In the Node tab page do the following:
    1. Select the required value from the Node Style dropdown list. Note Node style determines the composition of a unique node ID. The different values for node styles are as:
      • Level Name - the unique node ID is composed of level name and node name, for example "[Level 2].[B2]".
      • Name Only - the unique node ID is composed of level name, for example "B2".
      • Name Path - the unique node ID is composed of the result node name and the names of all ancestors apart from the (single physical) root node. For example "[A1].[B2]".
    2. Add the required columns as levels from the drop-down list. Note You can select columns from the required table fields in the drop-down list to add to the view.
    3. Select the required Level Type. Note The level type is used to specify formatting instructions for the level attributes. For example, a level of the type LEVEL_TYPE_TIME_MONTHS can indicate that the attributes of the level should have a text format such as "January", and LEVEL_TYPE_REGULAR indicates that a level does not require any special formatting.
    4. To control how the memebers of the hierarchy are ordered, select the required column in the OrderBy drop-down list. Note In the MDX client tools, the members will be sorted on this attribute.
    5. To sort the display of the hierarchy members in the ascending or descending order, select the required option from the Sort Direction drop-down list.
  6. In the Advanced tab page do the following:
    1. Select the required value in the Aggregate All Nodes. Note This option indicates that data is posted on aggregate nodes and should be shown in the user interface. For example, if you have the members A with value 100, A1 with value 10, and A2 with value 20 where A1 and A2 are children of A. By default the value is set to false, and you will see a value of 30 for A. With the value set to true, you will count the posted value 100 for A as well and see a result of 130. If you are sure that there is no data posted on aggregate nodes you should set the option to false. The engine will then calculate the hierarchy faster as when the option is set. Note that this flag is only interpreted by the SAP HANA MDX engine. In the BW OLAP engine the node values are always counted.
    2. Enter a value for the default member.
    3. To specify how to handle the orphan nodes in the hierarchy, select the required option as described below from the dropdown.
      Root NodeTreat them as root nodes
      ErrorStop processing and show an error
      IgnoreIgnore them
      Step ParentPut them under a step-parent node Note This enables you to create a text node and place all the orphaned nodes under this node.
  • Optional Step: If you have selected Step Parent in the Orphan Nodes drop-down, enter a value to create the step-parent node.
  • Select the Add a Root Node check-box if required. Note If a hierarchy does not have a root node but needs one for reporting use case, select this option. This will create a root node with the technical name “ALL” .
  • If the level hierarchy needs to support multiple parents for its elements for example, Country 'Turkey' to be assigned to two regions 'Europe' and 'Asia', select the Multiple Parent check-box.
  • Choose OK.

Let us take a scenario to explain Level Hierarchies. In eFashion package we created an Attribute View for Article_Lookup table below is the table fields

            • ARTICLE_ID
            • ARTICLE_LABEL
            • CATEGORY
            • SALE_PRICE
            • FAMILY_NAME
            • FAMILY_CODE

Here we can define a hierarchy by using field CATEGORY & ARTICLE_LABEL.

  • Open AV_AL (Attribute View-Article_Lookup)
  • Select Semantics.
  • Click on PLUS icon on Hierarchy box.
  • Provide Hierarchy definition(Name, Type, Node)
  • In Hierarchy Type –select Level Hierarchy.
  • OK,Validate & Save and Activate.
  • Activate Analytic View wherein AV_AL consumed.


  • Open Excel workbook > Data > From Other
    Sources > Select “From Data Connection Wizard”
  • Select Other/Advanced from Data ConnectionWizard.
  • Provide SAP HANA login credential, OK.
  • Select Package and cube (AV_SHOP_FACT), NEXT




  1. Select the Semantics node.
  2. In the Hierarchies panel, choose Create option .
  3. Enter a name and description for the hierarchy.
  4. In the Hierarchy Type dropdown, choose Parent Child Hierarchy.
  5. In the Node tab page, add the parent and child nodes by selecting the Parent Node and Child Node from the drop-down list. Note In case you decide to place the orphaned parent-child pair under a node called Step Parent from the Advanced tab page, you can specify its value in the Step Parent column. The step-parent node could only be one of the column or calculated column or the current view. You can specify different step-parent values for all the parent-child pairs. These values appear as a comma separated list in the Advance tab page Step Parent field. In case of a single parent-child node, you can also specify the value for step-parent node in the Advanced tab page. The same value appears in the Node tab page.
  6. In the Advanced tab page, do the following: Choose OK.
    1. Select the required value in the Aggregate All Nodes. Note This option indicates that data is posted on aggregate nodes and should be shown in the user interface. For example, if you have the members A with value 100, A1 with value 10, and A2 with value 20 where A1 and A2 are children of A. By default the value is set to false, and you will see a value of 30 for A. With the value set to true, you will count the posted value 100 for A as well and see a result of 130. If you are sure that there is no data posted on aggregate nodes you should set the option to false. The engine will then calculate the hierarchy faster as when the option is set. Note that this flag is only interpreted by the SAP HANA MDX engine. In the BW OLAP engine the node values are always counted.
    2. Enter a value for the default member.
    3. To specify how to handle the orphan nodes in the hierarchy, select the required option as described below from the dropdown.
      Root NodeTreat them as root nodes
      ErrorStop processing and show an error
      IgnoreIgnore them
      Step ParentPut them under a step-parent node Note This enables you to create a text node and place all the orphaned nodes under this node.
    4. Optional Step: If you have selected Step Parent in the Orphan Nodes dropdown, enter a value to create the step-parent node.
    5. Select the Add Root Node checkbox if required as described below. Note If a hierarchy does not have a root node but needs one for reporting use case, set the option to true. This will create a root node.
    6. If the level hierarchy needs to support mulitple parents for its elements for example, Country 'Turkey' to be assigned to two regions 'Europe' and 'Asia', select the Mulitple Parent checkbox.


Note: The hierarchies belonging to an attribute view are available in an analytic view that reuses the attribute view, in read-only mode. However, the hierarchies belonging to an attribute view are not available in a calculation view that reuses the attribute view.

Let us take a scenario of Parent –Child relationship in ITEM_MASTER table. Here for Item ID 2 and 3 the parent ID is 1 i.e. to say item id 2 & 3 falls under CONSUMABLES. Similarly Item 5 to 8 fall under STEEL category. Parent/Child Hierarchy type can be used to define Hierarchy in this scenario.


21Cutting Disc 4”
31Grinding Disc 4”
54Plate 10mm





Restricted Columns are used to filter the value based on the user defined rules on the attribute values.

Restricted Column dialog helps to create a restricted column and filter its value based on the columns that you select in the Restrictions view. In the Column dropdown, you can select a column of type measure for which you want to apply filter. In the Restrictions view, to apply a filter, you need to choose Add Restriction. Select a Column, an Operator, and enter a value. For example, you can create a restricted column to view the Revenue of a particular country, where Revenue is the measure and Country is the attribute having a list of countries. If you have added many restrictions, and do not want to apply all of them but want to retain them, deselect the Include checkbox.

Creating a Restricted Column

You use this procedure to create a restricted column to filter the value based on the user-defined restrictions for the attribute values.


For example, to filter the sales of a product based on a category you can create a restricted column Sales based on a measure Total Sales amount, and attribute category where you can restrict the value for category.


1.     In the Output panel of the Logical Join, right-click Restricted Columns, and choose New.


2.     Enter a name and description for the restricted column.


3.     From the Column dropdown, select a measure.


4.      In the Restrictions view, choose Add Restriction.


1.      In the Column dropdown, select column to define filter.


2.      Select the required operator.


3.      Enter the filter value.


4.      If you want to remove a particular filter on the column, deselect its corresponding Include checkbox.


5.      Choose OK.


Data preview shows Sales amount for only restricted attributes i.e, category "Jewelry" & Category "Pants".


Lets review below senario where in "Jewelry" category has been excluded.


Here you can see sales amount of all category except category "Jewelry".



CALCULATED COLUMN:Calculated columns are used to derive some meaningful information in the form of columns, from existing columns.

Calculated Column dialog helps you to derive a calculated column of type attribute or measure based on the existing columns, calculated columns,restricted columns, and input. You can write the formula in the Expression panel or assemble it making use of the available Elements, Operators and Functions.

You can specify how to aggregate row data for calculated column of type measure using the Calculate Before Aggregation checkbox and specifying Aggregation Type. If you select the Calculate Before Aggregation,the calculation happens as per the expression specified and then the results are aggregated as SUM, MIN, MAX or COUNT. If Calculate Before Aggregation is not selected, the data is not aggregated but it gets calculated as per calculation expression (formula), and the Aggregation is shown as FORMULA. After writing the expression, you can validate it using Validate.

You can also associate a calculated column with Currency or Unit of Measure using the Advanced tab page.


You use calculated columns to derive some meaningful information, in the form of columns, from existing columns, calculated columns, restricted columns and input parameters. 


For example:

  • To derive postal address based on the existing attributes.
  • To prefix the customer contact number with the country code based on the input parameter country.
  • To write formula in order to derive values like,
    if("PRODUCT" = 'ABC, "DISCOUNT" * 0.10,
    "DISCOUNT") if attribute PRODUCT equals the string ‘ABC’ then DISCOUNT
    equals to DISCOUNT multiplied by 0.10 should be returned. Otherwise the
    original value of attribute DISCOUNT should be used.



     In the Output panel of the Logical Join,right-click Calculated Columns, and choose New.

2.     Enter a name and description for the calculated column.

3.     Select the data type, and enter length and scale for the calculated column.

4.     Select the Column Type to specify the calculated column as attribute or measure.

5.     In case of measure column type, if you select Calculate Before Aggregation, select the aggregation type.

Note: If you select Calculate Before Aggregation, the calculation happens as per the expression specified and then the results are aggregated as SUM, MIN, MAX or COUNT. If Calculate Before Aggregation is not selected, the data is not aggregated but it gets calculated as per calculation expression (formula), and the Aggregation is shown as FORMULA.If the aggregatoin is not set, then it will be considered as attribute.

6.     In the expression editor enter the expression or assemble it using the menus in the below window.

7.     If you want to associate the calculated column with currency and unit of measuring quantity, select the Advanced tab page and select the required type.

8.     Choose OK.





You use this procedure to allow you to provide input for the parameters within stored procedures, to obtain a desired functionality when the procedure is executed.

In an Analytic View you use input parameters as placeholders during currency conversion, formulas like calculated columns where the calculation of the formula is based on the input you provide at runtime during data preview. Input parameters are not used for filtering attribute data in Analytic View that is achieved using variables.


In calculation Views you can use input parameter to during currency conversion, calculated measures, input parameters of the script node and to filter data as well.

You can apply input parameters in analytic and calculation views. If a calculation view is created using an analytic view with input parameters, those input parameters are also available in the calculation view but you cannot edit them.

The following types of input parameters are supported:              





Attribute Value/

Use this when the value
  of a parameter comes from an attribute.

Currency (available
  in Calculation View only)

Use this when the
  value of a parameter is in a currency format, for example, to specify the
  target currency during currency conversion.

Date (available in Calculation
  View only)

Use this when the
  value of a parameter is in a date format, for example, to specify the date
  during currency conversion.

Static List

Use this when the
  value of a parameter comes from a user-defined list of values.

Derived From Table
  (available in Analytic View and Graphical Calculation View)

Use this when the
  value of a parameter comes from a table column based on some filter
  conditions and you need not provide any input at runtime.


Use this when the
  value of a parameter could be anything of the selected data type.

Direct Type
  (avaliable in Analytic View)

To specify an input
  parameter as currency and date during currency conversion.



Each type of input parameter can be either mandatory or non-mandatory. For a mandatory input parameter, it is necessary to provide a value at runtime. However, for a non-mandatory input parameter, if you have not specified a value at runtime,the data for the column where the input parameter is used remains blank.


Note:You can check whether an input parameter is mandatory or not from the properties of the input parameter in the properties pane.

  • If you want to create a formula to analyze the annual sales of a product in various regions, you can use Year and Region as input parameters.
  • If you want to preview a sales report with data for various countries in their respective currency for a particular date for correct currency conversion, you can use Currency and Date as input parameters.


In Analytic View

  1. In the Output panel of the Data Foundation or Logical Join node, right-click the Input Parameters node.
    • Note: You can also create input parameters at the Semantics node level, using the Create Input Parameter option in the Variables/Input Parameters panel.
  2. From the context menu, choose New.
  3. Enter a name and description.
  4. Select the type of input parameter from the Parameter Type drop-down list.
    1. For the Column type of input parameter, you need to select the attribute from the drop-down list. At runtime the value for the input parameter is fetched from the selected attribute's data.
    2. For input parameter of type Derived from Table, you need to select a table and one of it's column as Return Column whose value is used as input for formula caluclation. You can also define conditions to filter the values of Return Column in the Filters   panel. For example, to calculate Discount for specific clients, you can create an input parameter based on Sales table and return column Revenue with filter set on  the Client_ID.
    3. For Direct Type input parameter, specify the Semantic Type that describes the use parameter as a currency or date , for example, to specify the target currency during currency conversion.
  5. If required, select a data type.
  6. Enter length and scale for the input parameter.
  7. Choose OK.

    In Calculation View


    1. In the Output panel,right-click the Input Parameters node.
    2. From the context menu, choose New.
    3. Enter a name and description.
    4. Select the type of input parameter from the drop-down list.
      1. For the Attribute Value type of input parameter, you need to select the attribute from the drop-down list. At runtime the value for the input parameter is fetched from the selected attribute's data.
      2. For input parameter of type Derived from Table, you need to select a table and one of it's column as Return Column whose value is used as input for formula caluclation. You can also define conditions to filter the values of Return Column in the Filters panel. For example, to calculate Discount for specific clients, you can create an input parameter based on Sales table and return column Revenue with filter set on the Client_ID.
    5. Select a data type.
    6. Enter length and scale for the input parameter.
    7. Choose OK.


    Measures used in analytic views and calculation views can be defined as amount or weight in the analytical space using currency and unit ofmeasure. You can also perform currency conversion and unit of measure conversion.

    For example, you need to generate a sales report for a region in a particular currency, and you have sales data in database tables ina different currency. You can create an analytic view by selecting the table column containing the sales data in this other currency as a measure, and perform currency conversion. Once you activate the view, you can use it to generate reports.

    Similarly, if you need to convert the unit of a measure from cubic meters to barrels to perform some volume calculation and generate reports, you can convert quantity with unit of measure.

    To simplify the process of conversion, system provides the following:

    • For currency conversion - a list of currencies, and exchange rates based on the tables imported for currency
    • For quantity unit conversion - a list of quantity units and conversion factors based on the tables imported for units.

    Currency conversion is performed based on the source currency, target currency, exchange rate, and date of conversion. You can also select currency from the attribute data used in the view.

    Similarly, quantity unit conversion is performed based on the source unit,target unit, and conversion factor.

    You can also select the target currency or unit of measure at query runtime using input parameters. If you use this approach then, you have to first create an input parameter with the desired currency/unit specified, and use the same input parameter as target in the conversion dialog.
    Currency conversion is enabled for analytic views and base measures of calculation views.



    You have imported tables T006, T006D, and T006A for Unit of Measure.

    You have imported TCURC, TCURF, TCURN, TCURR, TCURT, TCURV, TCURW, and TCURX for currency.



    1. Select a measure.
    2. In the Properties pane, select Measure Type.
    3. If you want to associate the measure with a currency, perform the following substeps:

                 a. In the Measure Type dropdown list, select the value Amount with Currency.

                 b.In the Currency Dialog,select the required Type as follows:




      To select currency from the currency table TCURC.


      To select currency from one of the attributes used in the view.

                c. Select the required value, and choose OK.

                d. If you want to convert the value to another currency, choose Enable for Conversion.

                            i.To select the source currency, choose Currency.

                            ii..Select the target currency.

                                Note: For currency conversion, in addition to the types Fixed and Attribute, you can select Input Parameter to provide target currency at                             runtime. If you select an input parameter for specifying target currency and deselect Enable for Conversion checkbox, the target currency                             field gets clear because input parameters can be used only for currency conversion.

                             iii.To specify exchange rate type, in the Exchange Rate Types dialog, select the Type as follows:




      To select exchange rate from the currency table TCURW.

      Input Parameter

      To provide exchange rate input at runtime as input parameter.

                             iv.To specify the date for currency conversion, in the Conversion Date dialog, select the Type as follows:





      To select conversion date from the calendar.


      To select conversion date from one of the attributes used in the view.

      Input Parameter

      To provide conversion date input at runtime as input parameter.

                           v.To specify the schema where currency tables are located for conversion, in the Schema for currency conversion, select the required schema.

                           vi.To specify the client for which the conversion rates to be looked for, in the Client for currency conversion, select the required option.


                  e. From the dropdown list, select the required value that is used populate data if the conversion fails:





    In data preview, the system displays an error for conversion failure.

    Set to NULL

    In data preview, the value for the corresponding records is set to NULL.


    In data preview, you view the unconverted value for the corresponding records.

    4.If you want to associate a measure with a unit of measure other than currency, perform the following substeps:

        a. Select the value Quantity with Unit of Measure in the Measure Type drop-down list.

        b. In the Quantity Units dialog, select the required Type as follows:





      To select a unit of measure from the unit tables T006 and T006A.


      To select a unit of measure from one of the attributes used in the view.

        c. Select the required value, and choose OK.

    5.Choose OK.

    Note You can associate Currency or Unit of Measure with a calculated measure, and perform currency conversion for a calculated measure by editing it


    After we installed Hadoop in the machine, we then need to install Hive in the second section.

    1. Download Hive

    Download Hive-0.13.0 from, and unzip and put Hive package together with Hadoop (not necessary to put Hive together with hadoop, but it is easy to manage in the future).

    2. Add path to Hive

    Add the statements to $HOME/.bashrc to add path

    export HIVE_HOME=/ hive/path

    export PATH=$PATH:$HIVE_HOME/bin

    export PATH=$PATH:$HIVE_HOME/lib

    3. Make file on Hadoop file system

    Make file on hadoop file system for hive database

    hadoop fs –mkdir /user/hive/warehouse

    hadoop fs –mkdir /temp

    4. file

    Go to hive/bin, find and add:



    export HADOOP_INSTALL= /hadoop/path (the same path as in section 1)

    Start hive by typing in “hive” in console, you will see hive cli and do queries with Hiveql.


    Notice: the hive default metadata is stored by Derby. You can only access to you previous database from the same location where you start hive last time. Otherwise, you would not be able to see your previous database. Also, hive will create metastore_db and a log file anywhere you start hive. To fix it, you may set the hive metastore with mysql. I will write this instruction later.

    To Access to Hive from HANA, we first should have Hadoop and Hive installed. In the first section and the second section, installation of Hadoop and hive will be introduced. 

    1. Download Hadoop and move to directory

    Download Hadoop from apache Hadoop mirror:

    In this case, we choose Hadoop-2.2.0.

    Unzip the downloaded Hadoop package and put the Hadoop fold to directory where you want it to be installed.

    tar -zxvf  hadoop-2.2.0.tar.gz

    Switch to your Hana server user:

    su hana_user_name.

    We need to install Hadoop under Hana user, because Hana server needs to communicate with Hadoop with the same user.

    If you just want to set up Hadoop without accessing from Hana, you can simply create a dedicate Hadoop account by “addgroup” and “adduser” (these two command lines depend on system, Suse and Ubuntu seem to have different command lines)


    2. Check Java

    Before we install the Hadoop, we should make we have Java installed.


    java –version 

    to check java and find java path by

    whereis java

    And write the following script in $HOME/.bashrc to add your java path:

    export JAVA_HOME=/java/path/

    export PATH=$PATH:/java/path/

    3. SSH passwordless

    Install ssh first if you don’t have.

    Type the following commands in console to create a public key and put the key to authorized keys

    ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
    cat ~/.ssh/ >> ~/.ssh/authorized_keys


    4. Add path to Hadoop

    Write the following script in $HOME/.bashrc if you want to add the Hadoop path permanently.

    Open the .bashrc file by

    vi $HOME/.bashrc

    Add the following script

    export HADOOP_INSTALL=/hadoop/path/

    For the hadoop path, I put the Hadoop folder under /usr/local,

    so I use /usr/local/hadoop instead of /hadoop/path/ in my case


    export PATH=$PATH:$HADOOP_INSTALL/sbin


    5. Hadoop configuration

    Find the configuration files, core-site.xml, hdfs-core.xml, yarn-site.xml, mapred-site.xml, in Hadoop folder. These files exist in $ HADOOP_INSTALL /etc/hadoop/ under you Hadoop folder. You may simply rename the “template file” in the folder if you can find the xml files. For example:

    cp mapred-site.xml.template mapred-site.xml

    Some other tutorials said you can find them under /conf/ directory, I guess /conf/ is for older Hadoop version, but in hadoop-2.2.0 the files are under /etc/hadoop/


    Modify the configuration files as followed:

    vi core-site.xml

    Put the following between configuration tab



    <value>hdfs://computer name or IP(localhost would also work):8020</value>



    vi hdfs-site.xml

    Put the following between configuration tab


















    vi yarn-site.xml

    Put the following between configuration tab



    <value>yourcomputername or IP</value>







    vi mapred-site.xml

    Put the following between configuration tab






    For more information about all the tabs, please check






    add the following two statement in the end of this file

    export HADOOP_COMMON_LIB_NATIVE_DIR=/hadoop/path/lib/native

    export HADOOP_OPTS="-Djava.library.path=/hadoop/path /lib"


    6. Start Hadoop

           The last thing needs to do before starting your Hadoop is to format your namenode and datanode simply by:

                hadoop namenode -format


    In the end, you can start Hadoop by calling “”, you may find this file in /hadoop/path/sbin.

    To check your Hadoop has started, type


    You should see NameNode, NodeManager, DataNode, SecondaryNameNode and ResourceManager are running.



    Alternatively, you can also check if Hadoop is running by visiting localhost:50070 to check Hadoop file system information


    and localhost:8088 to check cluster information.



    You may find that localhost:50030 contains jobtracker info in some tutorials. However, localhost:50030 does not exist in hadoop-2.2.0, because hadoop-2.2.0 divides the two major functions of the JobTracker: resource management and job life-cycle management into separate components. Don’t worry about localhost:50030 not working.


    Filter Blog

    By author:
    By date:
    By tag: