Currently Being Moderated

If you follow my blogs (I hope you do ) then you know I really love the R programming language but I also love SAP HANA and in the past I have dealt with integration between those two:

 

HANA meets R

 

R meets HANA

 

Sanitizing data in SAP HANA with R

 

But...those integrations were not done using the SAP way...which means, they are not supported or endorsed by SAP...

 

Gladly...as of today, there's an official SAP way to do it!

 

First, we need to read and follow all the instructions detailed in Get your own SAP HANA DB server on Amazon Web Services by the most awesome Juergen Schmerder. (Believe! It took me less than 10 minutes to get my SAP HANA running on my laptop...really...a piece of cake).

 

With our SAP HANA instance up and running, we can dedicate ourselves to the funny part...the #R part

 

Go to your AWS Management Console and under Amazon EC2, launch a new instance...

 

SUSU_Linux.png

 

You have to choose SUSE Linux Enterprise with 32 bit. I tried with 64 bit and it wasn't funny...didn't work and I lost a lot of time...32 bit for the win!

 

For the installation, you can follow this link SAP HANA Database Development Guide – Integration with R programming language, but at least in my case, I need to deal with a lot of difficulties, that gladly I'm going to write down in this blog, so you don't have to deal with them

 

First, we need a compiler as we're going to compile #R from it's source.

 

Get a compiler...
sudo zypper install gcc gcc-c++ gcc-fortran

 

Then we need to get and extract the #R source code.

 

Get source code, extract and compile

wget http://cran.r-project.org/src/base/R-2/R-2.13.0.tar.gz

tar zxf R-2.13.0.tar.gz && cd R-2.13.0

./configure --enable-R-shlib --with-readline=no --with-x=no

make clean

make

make install

 

This step really takes a long time...so you better go doing something more productive in the meantime...

 

When #R is finally installed, we need to download and install the Rserve package.

 

Download Rserve

wget http://www.rforge.net/Rserve/snapshot/Rserve_0.6-5.tar.gz

 

Now, we have to log into R and do the installation...

 

Install and test Rserve

R

install.packages("/PATH_TO_FILE/Rserve.tar.gz", repos = NULL)

library("Rserve") #To test the installation. If there's no output, then it's working fine

q()

 

If you get an error regarding a personal library...just say "y".

 

Once Rserve is install, we need to create a config file.

 

Rserv.conf

vi /etc/Rserv.conf

maxinbuf 10000000

Maxsendbuf 0

remote enable

#Press ESC key

:w

#Press ESC key

:q!

 

Now, we have to create a user that will run the Rserve so we can connect to it from SAP HANA.

 

 

Create new user with password

useradd -m login_name

passwd login_name

 

 

For some reason Amazon doesn't provide the password for the root user...but we might need it eventually...so just do this...after all, if your user and you're paying for it...

 

 

Change root password

sudo passwd root

#Assign a password

 

 

Great, we're are now ready to start our server! (You need to be logged as the new user that we create in a previous step).

 

 

Start the Rserve server
R CMD Rserve --RS-port 6311 --no-save --RS-encoding "utf8"

 

Now...we're ready to move to move to our SAP HANA server and keep configuring

 

Configure SAP HANA

Right click on your system node at the navigator tab

Select Administration

Select on the right hand side the Configuration tab

Select the indexserver.ini

Select the calcengine

 

#Add the following parameters...

 

cer_timeout - 300

cer_rserve_addresses - Our R Amazon server:6311

cer_rserve_maxsendsize - 0

 

One more thing, and we're ready to roll...go to your AWS Management Console, EC2 and then choose Security Groups. Our R server is going to be assign to "Quicklaunch-1". Just select it and go to Inbound. And add the port "6311".

 

Security_Group.png

 

That's all folks...we're officially ready to go!

 

On SAP HANA, create a table and call it TICKETS_BY_YEAR with the following structure:

 

HANA_R_Kiss_001.png

 

Open a SQL Editor and copy the following code...

 

Fill up TICKETS_BY_YEAR

insert into "SYSTEM"."TICKETS_BY_YEAR" values('20110101',4195);

insert into "SYSTEM"."TICKETS_BY_YEAR" values('20110201',4245);

insert into "SYSTEM"."TICKETS_BY_YEAR" values('20110301',4971);

insert into "SYSTEM"."TICKETS_BY_YEAR" values('20110401',4469);

insert into "SYSTEM"."TICKETS_BY_YEAR" values('20110501',4257);

insert into "SYSTEM"."TICKETS_BY_YEAR" values('20110601',4973);

insert into "SYSTEM"."TICKETS_BY_YEAR" values('20110701',4470);

insert into "SYSTEM"."TICKETS_BY_YEAR" values('20110801',4981);

insert into "SYSTEM"."TICKETS_BY_YEAR" values('20110901',4530);

insert into "SYSTEM"."TICKETS_BY_YEAR" values('20111001',4167);

insert into "SYSTEM"."TICKETS_BY_YEAR" values('20111101',4059);

insert into "SYSTEM"."TICKETS_BY_YEAR" values('20111201',1483);

 

This table is supposed to hold the tickets sales for a given company, during each month of the year 2011. What we want to do here is to determine or predict how are going to our sales on 2012. We're going to use #R for that matter.

 

Create the following script and call it "Predict_Tickets". This script will have two Stored Procedures, called Prediction_Model and Get_Tickets.

 

Predict_Tickets

CREATE TYPE T_PREDICTION_TICKETS AS TABLE (

PERIOD VARCHAR(8),

TICKETS INTEGER

);

 

DROP PROCEDURE Prediction_Model;

DROP PROCEDURE Get_Tickets;

 

CREATE PROCEDURE Prediction_Model(IN tickets_year TICKETS_BY_YEAR,OUT result T_PREDICTION_TICKETS)

LANGUAGE RLANG AS

BEGIN

period=as.integer(tickets_year$PERIOD)

tickets=as.integer(tickets_year$TICKETS)

var_year=as.integer(substr(period[1],1,4))

var_year=var_year+1

new_period=gsub("^\\d{4}",var_year,period)

next_year=data.frame(year=new_period)

prt.lm<-lm(tickets ~ period)

pred=round(predict(prt.lm,next_year,interval="none"))

result<-data.frame(PERIOD=new_period,TICKETS=pred)

END;

 

CREATE PROCEDURE Get_Tickets()

LANGUAGE SQLSCRIPT AS

BEGIN

Tickets = SELECT * FROM TICKETS_BY_YEAR;

CALL Prediction_Model(:Tickets,T_PREDICTION_TICKETS);

INSERT INTO "TICKETS_BY_YEAR" SELECT * FROM :T_PREDICTION_TICKETS;

END;

 

CALL Get_Tickets();

SELECT * FROM "TICKETS_BY_YEAR";

 

As you can see, our first Stored Procedure called Prediction_Model, we're using RLANG as the script language...meaning that we're going to embed R code that will go from our SAP HANA to our R Serve and back with the modified information.

Prediction_Model is calling the Stored Procedure Get_Tickets, which is doing a select from the table TICKETS_BY_YEAR and then calling Prediction_Model to finally insert the data back into SAP HANA.

 

At the end of our script, we call Get_Tickets and do a select to TICKETS_BY_YEAR to verify that our script worked.

 

HANA_R_Kiss_002.png

Success! Our SAP HANA and R integration work like a charm! We never left SAP HANA Studio, but our code went to the R Server and back to bring us the modified information...all in just 829 milliseconds...really fast considering that both servers are in the cloud...

 

That's all for now...I will come back with more information on SAP HANA and R as soon as I can...there's still a lot to discover and test

Comments

Actions

Filter Blog

By author:
By date:
By tag: