1 2 3 6 Previous Next

SAP Predictive Analysis

79 Posts

How well will you do tomorrow? How can we be sure?


Algorithmic and biomedical advances are now producing sports coaches, mangers and team owners the tools to predict which players have picked and which ones have their full potential ahead of them.

I don’t use much of quantitative methods when it comes to sports. I think it takes away my excitement.




After the Super Bowl game finished – I saw on twitter that SAP had predicted that Denver will win over Seattle in a close match. As it turned out – Seattle won a rather one sided match with a very young side.


I didn’t work on the predictive Analytics solution that made the prediction for Super Bowl and I am not authorized by SAP to provide a response. But I wanted to share my personal views on this matter.

Then I saw Vijay Vijayasankar’s discussion about the perils of predictive analytics. He makes the crucial points:


Predictive Analytics in general cannot be used to make absolute predictions when there are so many variables involved . In fact – I think there is no place for absolute predictions at all . And when the results are explained to the non-statistical expert user – it should not be dumbed down to the extent that it appears to be an absolute prediction .

Predictive models make assumptions – and these should be explained to the user to provide the context . And when the model spits out a result – it also comes with some boundaries (the probability of the prediction coming true , margin of error , confidence etc). When those things are not explained – predictive Analytics start to look like reading palms or tarot cards . That is a disservice to Predictive Analytics .

If the chance of Denver winning is 49% and Seattle winning is 51% – it doesn’t exactly mean Seattle will win . And not all users will look at it that way unless someone tells them more details .

In business , there is hardly any absolute prediction ever . Analytics provide a framework for decision making for the business leaders . Analytics can say that if sales increases at the same historic trend , Latin America will outperform planned numbers next year compared to Asia. However , the global sales leader might know more about the nuances that the predictive model had no idea of, and hence can decide to prioritize Asia . The additional context provided by predictive Analytics enhances the manager’s insight and over time will trend to better decisions . The idea definitely is not to over rule the intuition and experience of the manager . Of course the manager should understand clearly what the model is saying and use that information as a factor in decision making .

When this balance in approach is lost – predictive Analytics gets an unnecessary bad rap.

I thought I would post this quick blog to try and help anybody out who may come across the same issue as I did.


When trying to do some data preparation via filtering on a date in the "Predict" tab of SAP Predictive Analysis, I received the following error when trying to run the analysis: "An error occurred while executing the query.Error details: SAP DBTech JDBC: [266]: inconsistent datatype: 3-3-2014 is not a DATE type const: line 1 col 779 (at pos 778)". The error with a different format is the same:




Unfortunately, this error is vague and does not point out what a valid date format is or where to find the valid date formats. I tried multiple formats until I stumbled upon one that works (ie, YYYY-MM-DD):



Hopefully this helps some people out.


If you know of documentation or other date formats that are supported, please share!

Continuing from previous post we now explore Sentiment Analysis. First of all let’s talk about Sentiment Analysis and Text Mining and what exactly it means when we speak about these terms. Wikipedia defines Sentiment Analysis as “Generally speaking, sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document”. Sometimes it is also called as Opinion Mining which is extracting information from people’s opinions. Opinions are usually in the form of text and hence to do Sentiment Analysis we need some knowledge of Text Mining also. Text Mining in the words of Hearst (1999) is “the use of large online text collections to discover new facts and trends about the world itself" Standard techniques are text classification, text clustering, ontology and taxonomy creation, document summary and latent corpus analysis.  We are going to use the combination of both Sentiment Analysis and Text Mining in our example scenario discussed below.

Before I start let me make it clear that this is only sample data which was analyzed only for the purpose learning. It’s not to target any brand or influence any brand. The outputs and analysis shown here are just based on opinion and should not be considered facts.

I downloaded some public opinion data regarding Car Manufacturer from the NCSI-UK website.

Scores By Industry

The data is from 2009-2013. My intention was to just see what is the public sentiment of people for these manufacturers on Social Networking Site twitter and build a probable score for 2014 based on twitter sample population. The intention is just to see if the scores are similar to those obtained in 2013.


The steps to do sentiment analysis using SAP PA and twitter are shown below. The code is shown at the end of this post.


1. Load the necessary packages. Also load the credential file that stores the credential information required to connect to twitter. This credential file was created using the steps shown in the below post. Also establish the handshake with twitter.

2. Retrieve the tweets for each of the brand in our data-set (total 9) and save the information in a data-frame for each car brand.

3. The next step is to analyse the tweets obtained for negative and positive words. For this we use something called as Lexicons. As per Wiki, the word "lexicon" means "of or for words". A Lexicon is basically similar to dictionary and collection of words. For our sentiment analysis we are going to use Lexicon of Hu and Liu available at Opinion Mining, Sentiment Analysis, Opinion Extraction. The Hu and Liu Lexicon is a list of positive and negative opinion words or sentiment words for English (around 6800 words).  We download the Lexicon and save it on our local desktop. We load this file to create an array of positive and negative words as shown in the code. We can also append our own list of positive and negative words as required.

4.Now that we have an array of positive and negative words we need to compare them with the tweets we obtained and assign a score of 1 to each positive word in the tweet and -1 to each negative word in the tweet. Each score of 1 is considered a positive sentiment and a score of -1 is considered a negative sentiment.

The sum of overall sentiment score gives us the net sentiment for that brand. For this we require a Sentiment Scoring function. I have used the function available at the below website.I have used the function As-Is from the below website and give full credit to the author who created that function. This function is not created by me.

How-To | Information Research and Analysis (IRA) Lab

5. After getting the sentiment score for each brand next step is to sum the score and assign it to an array. This array than we bind with our original data set. We use this final table to generate heat maps as shown below:

Final Output with Sentiment Score




Heat Maps





As we see from the above analysis that although the industry score for one brand (Audi) is quite high, the current pubic sentiment is with another brand (Vauxhall) that had an overall low industry score. This is just a basic analysis with 500 tweets. We can extend this analysis further and try to increase the tweets and create a more advanced score function that uses other parameters like region, time and historical data while calculating the final sentiment score.

This post serves as a starting point for anyone interested in doing Sentiment Analysis using twitter. There is certainly a lot of possibility to explore.



mymain<- function(mydata, mytweetnum)



## Load the necessary packages for twitter connecttion






##Packages required for sentiment analysis




##Loading the credential file saved

load('C:/Users/bimehta/Documents/twitter authentication.Rdata')


options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))


## Retrieving the tweets for the brands in our excel.

tweetList <- searchTwitter("#Audi", n=mytweetnum)

Audi.df = twListToDF(tweetList)


tweetList <- searchTwitter("#BMW", n= mytweetnum)

BMW.df = twListToDF(tweetList)


tweetList <- searchTwitter("#Nissan", n= mytweetnum)

Nissan.df = twListToDF(tweetList)


tweetList <- searchTwitter("#Toyota", n= mytweetnum)

Toyota.df = twListToDF(tweetList)


tweetList <- searchTwitter("#Volkswagen", n= mytweetnum)

Volkswagen.df = twListToDF(tweetList)


tweetList <- searchTwitter("#Peugeot", n= mytweetnum)

Peugeot.df = twListToDF(tweetList)


tweetList <- searchTwitter("#Vauxhall", n= mytweetnum)

Vauxhall.df = twListToDF(tweetList)


tweetList <- searchTwitter("#Ford", n= mytweetnum)

Ford.df = twListToDF(tweetList)


tweetList <- searchTwitter("#Renault", n= mytweetnum)

Renault.df = twListToDF(tweetList)


##Upload the Lexicon of Hu and Liu saved on your desktop

hu.liu.pos = scan('C:/Users/bimehta/Desktop/Predictive/Text Mining & SA/positive-words.txt', what='character', comment.char=';')

hu.liu.neg = scan('C:/Users/bimehta/Desktop/Predictive/Text Mining & SA/negative-words.txt', what='character', comment.char=';')


##Build an array of positive and negative words based on Lexicon and own set of words

pos.words = c(hu.liu.pos, 'upgrade')

neg.words = c(hu.liu.neg, 'wtf', 'wait','waiting','fail','mechanical','breakdown')


## Build the score sentiment function that will return the sentiment score

score.sentiment = function(sentences, pos.words, neg.words, .progress='none')



  # we want a simple array ("a") of scores back, so we use

  # "l" + "a" + "ply" = "laply":


  scores = laply(sentences, function(sentence, pos.words, neg.words) {


    # clean up sentences with R's regex-driven global substitute, gsub():


    sentence = gsub('[[:punct:]]', '', sentence)


    sentence = gsub('[[:cntrl:]]', '', sentence)


    sentence = gsub('\\d+', '', sentence)


    # and convert to lower case:


    sentence = tolower(sentence)


    # split into words. str_split is in the stringr package


    word.list = str_split(sentence, '\\s+')


    # sometimes a list() is one level of hierarchy too much


    words = unlist(word.list)


    # compare our words to the dictionaries of positive & negative terms


    pos.matches = match(words, pos.words)

    neg.matches = match(words, neg.words)


    # match() returns the position of the matched term or NA

    # we just want a TRUE/FALSE:


    pos.matches = !is.na(pos.matches)


    neg.matches = !is.na(neg.matches)


    # and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum():


    score = sum(pos.matches) - sum(neg.matches)




  }, pos.words, neg.words, .progress=.progress )

  scores.df = data.frame(score=scores, text=sentences)




## Creating a Vector to store sentiment scores

a = rep(NA, 10)


## Calculate the sentiment score for each brand and store the score sum in array

Audi.scores = score.sentiment(Audi.df$text, pos.words,neg.words, .progress='text')

a[1] = sum(Audi.scores$score)


Nissan.scores = score.sentiment(Nissan.df$text, pos.words,neg.words, .progress='text')



BMW.scores = score.sentiment(BMW.df$text, pos.words,neg.words, .progress='text')

a[3] =sum(BMW.scores$score)


Toyota.scores = score.sentiment(Toyota.df$text, pos.words,neg.words, .progress='text')



##Sentiment Score for other brands is considered 0



Volkswagen.scores = score.sentiment(Volkswagen.df$text, pos.words,neg.words, .progress='text')



Peugeot.scores = score.sentiment(Peugeot.df$text, pos.words,neg.words, .progress='text')



Vauxhall.scores = score.sentiment(Vauxhall.df$text, pos.words,neg.words, .progress='text')



Ford.scores = score.sentiment(Ford.df$text, pos.words,neg.words, .progress='text')



Renault.scores = score.sentiment(Renault.df$text, pos.words,neg.words, .progress='text')



##Plot the histogram for a few brand.


hist(Audi.scores$score, main="Audi Sentiments")

hist(Nissan.scores$score, main="Nissan Sentiments")

hist(Vauxhall.scores$score, main="Vauxhall Sentiments")

hist(Ford.scores$score, main="Ford Sentiments")


## Return the results by combining sentiment score with original dataset

result <- as.data.frame(cbind(mydata, a))





Code Acknowledgements:

Opinion Mining, Sentiment Analysis, Opinion Extraction

How-To | Information Research and Analysis (IRA) Lab

R by example: mining Twitter for consumer attitudes towards airlines

While doing some research on Sentiment and Text Analysis for one of my projects, I came across a really nice blogspot.



Inspired by the above, I thought of doing some sentiment analysis in SAP PA using twitter tweets.Hence decided to go ahead and do some text mining and Sentiment Analysis using the twitteR package of R.

I have created a multi-series blog where we see the different things we can do using SAP PA, R and Twitter.


First blog here talks about how get the twitter data inside SAP PA and build a word-cloud by building a text corpus.



I downloaded some public opinion data regarding Car Manufacturer from the NCSI-UK website.


The data is from 2009-2013. My intention was to just see what is the public sentiment of people for these manufacturers on Social Networking Site twitter and build a probable score for 2014 based on twitter sample population. I loaded the data in SAP PA. First I build a word cloud for some of the hashtags of the cars and plot a graph on number of re-tweets. In the next blog postings I will be doing Sentiment Analysis of this data and Emotion Classification.


Before I start let me make it clear that this is only sample data which was analyzed only for the purpose learning. It’s not to target any brand or influence any brand. The outputs and analysis shown here are just based on opinion and should not be considered facts.

Step1: Setting up the Twitter account and API for handshake with R

Please refer this step by step document to setup the twitter API and the settings required to call the API and get tweet data inside R.

Setting up Twitter API to work with R


Step2: Getting the tweet data in SAP PA and building a word-cloud.

Now we need to create a custom R component to get the data into SAP PA and create a text corpus and display it as a word-cloud. I have used the tm_map function comes that comes with the tm package for setting up the text corpus data for word-cloud. The various commands are self-explanatory as shown in the comments. I have used wordcloud package to generate the word-cloud.


The code below lists down the steps you need to do to get the desired output. The configuration settings are shown in the screenshots below.


mymain<- function(mydata, mytweet, mytweetnum)




##Load the necessary packages










## Enable Internet access.



##Load the environment containing twitter credential data (saved in Step 1)

load('C:/Users/bimehta/Documents/twitter authentication.Rdata')


##Establish the handhsake with R


options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))


##Get the tweet list from twitter site (based on parameters entered by user)

tweetList <- searchTwitter(mytweet, n=mytweetnum)


##create text corpus

r_stats_text <- sapply(tweetList, function(x) x$getText())

r_stats_text_corpus <- Corpus(VectorSource(r_stats_text))


##clean up of twitter Text data by removing punctuation and English stop words like "the", "an"

r_stats_text_corpus <- tm_map(r_stats_text_corpus, tolower)

r_stats_text_corpus <- tm_map(r_stats_text_corpus, removePunctuation)

r_stats_text_corpus <- tm_map(r_stats_text_corpus, removeWords, stopwords("english"))

r_stats_text_corpus <- tm_map(r_stats_text_corpus, stemDocument)



##Build and print wordcloud

out2 <-wordcloud(r_stats_text_corpus, scale=c(10,1), random.order=FALSE, rot.per=0.35, use.r.layout=FALSE, colors="blue")




## Return the twitter data in a table

result <- as.data.frame(cbind(Audi.df$text, Audi.df$created, Audi.df$statusSource, Audi.df$retweetCount))




Configuration Setting:




Running the Algorithm and getting the output:



The output table (created on is char):










The general opinion of the public from wordcloud seems positive. However we will do a detailed sentiment analysis of the various brands in our source file and plot the heat map based on 2013 survey findings in my next blog. This will help us know whether current public sentiment is in line with survey findings.

To be continued in Sentiment Analysis.



Following on from Clarissa Dold's announcement about the KXEN acquisition end-2013, I wanted to take this opportunity to introduce to you the latest addition to SAP's predictive analytics portfolio: SAP Infinite Insight .


The majority of this information is already available through Clarissa's blog and external PA Roadmap presentation. I started chatting about this topic on this discussion here: Starting with KXEN - [Updated with more info] but it wasn't enough.


So the purpose of this blog is to offer an overview of the 'solution brief' including product positioning; a description of current software modules & deployment options; followed by some mention of future integration plans and tentative possibilities. Finally, a consolidation of useful resources (links etc) for your own on-boarding.


I've shown this type content during regional enablement workshops, so I'm hoping it'll be of use to you too!






  • Let's start with a positioning slide which describes some of the key benefits and features of this product. The key message here is that you don't need to be a data scientist to use the tool effectively!


1 intro.png


  • Taking this differentiation further, we can call-out the specific areas where Infinite Insight has clearly gained a competitive advantage over classic data-mining vendors:


2 intro why.png


  • Infinite Insight is revolutionizing the way companies use predictive analytics to make better decisions on petabytes of big data. Their unprecedented solution approach allows line of business users to solve common predictive problems without the need for highly skilled data scientists.


3 model lifecycle.png


  • Infinite Insight is a suite of tools providing predictive analytic applications for the automated creation of accurate and robust predictive models.  These solutions replace the classic model creation process, which is manual, repetitive and prone to human error.


3 overview modules.png


  • Explorer is an extremely powerful data-manipulation tool, which allows the designer to create derived columns and row-values, effectively “exploding out” existing data into new compound variables and ratios. Lots of semantic definitions and transformations can be authored here into the dataset.


5 a explorer.png


  • The Modeler is the main workspace/module for mining activities: Classification, regression, segmentation and clustering. It generates statistical models, and represents them using indicators and chart types.


5 b modeller.png


  • Factory is a secured java web-deployed interface, which includes Roles & Rights administration on  the server platform. From there, Projects are accessed by users, are assigned Models, and KPI evaluation/Model retraining can be scheduled as Tasks.


5 c factory.png


  • Scorer is a feature that exports regression & segmentation models in different programming languages. The generated code (or SQL) allows models be applied outside of InfiniteInsight. It reproduces the operations made during data encoding and either Classification/Regression or Clustering efforts.


5 c scorer.png


  • Social improves decision capacities by extracting the implicit structural relational information of the dataset. You can then navigate a social network, the structure of which is represented in the form of a graph, made of nodes and links. For example, it can help identify individuals of influence within a community.


5 c social.png



5 component model.png


  • In terms of licensing and selling 'software bundles', smaller departments would likely consider the desktop "thick-client" workstation Modeller installations, whereas larger enterprises would implement the full "suite" of client-server components:


5 software bunble options.png


  • You need to be prudent when obtaining your package from the SMP download marketplace  as there are a number of items available to cover the various license and audience options:


6 installation types.png


  • Infinite Insight's data mining methods are unique in the market, here are a few of the value propositions & differentiators which set it aside from the competition:


8 the benefits of SRM.png


  • There is a wealth of existing guides and training available, to help you further your knowledge of the product. The documentation are very detailed, as is the online course, and locally installed media (post-installation):


9 product docu.png


  • The documentation at help.sap.com perfectly complements the RKT learning maps, you'll be an expert in no time:


11 doc page.png


  • Just to reiterate again, the legacy named "KXEN" has been totally retired from the product portfolio, we are now dealing exclusively with SAP Infinite Insight (II):


22 product rename.png


  • This is the snapshot of the combined "PA" and "II" roadmap plans (subject to change). Whilst Infinite Insight's capabilities will strengthen for the next +1 release, incremental features will also be ported to the Predictive Analysis (and hence Lumira) client, and Server capabilities will be delegated down to the HANA in-memory processing platform:


555 future integration roadmap.png


  • Focusing specifically on Infinite Insight's next-steps, we will be seeing initially tighter, followed by complete/native integration of ex-KXEN assets into the SAP Predictive Analytics portfolio, in keeping with our commitment to strategic initiatives such as In-Memory, Big Data, Cloud, Mobile and agile Visualization:


666  II_roadmap.png


  • Here's a non-binding illustration of our go-to-market intentions for 2014. These estimated timelines are subject to change and purely communicated in the spirit of openness:


555 future integration roadmap FULL overview.png


  • One thing is for sure, PA will be the interface going forward (so that Infinite Insight can benefit from its flashy CVOM visualization gallery and HTML5 agility). Our first expectation is that the ex-KXEN proprietary algorithm will start to appear in the Predictive Analysis Designer:


33 kxen into PA.png


  • We're going to harness the processing power of HANA's in-memory platform to maximize the reach of KXEN's unique approach to data mining. Infinite Insight algorithms are going to be rewritten into HANA as 'function libraries' that can be called by the Application Foundation Modeler or other SAP apps:


99 lab preview KFL into hana.png


  • As mentioned already, we have a vision of a unified client. A single desktop experience that will cover the full spectrum of use-cases, from the casual end-user Lumira 'visualize' workflows, through to business-users wanting to 'predict', through to analysts/scientists wanting to 'analyze' deeper.
  • Here's a mock-up of what that could look like, as the user is guided into the application:


99 lab preview unified client.png


  • Other innovations we might see could include an intuitive "drag to forecast" - how pleasant an experience that would be on a tablet device!


99 visual drag to forecast.png


  • One thing is for sure, Infinite insight's advanced statistical charts will massively benefit from the refresh they are about to receive from its inclusion into the Lumira suite (CVOM charting and HTML5). We can envisage drill-able charts to find influencers, similar to the BO Set Analysis of old:


999 drillage influencers chart.png


  • This all ties-in very significantly into the wider plans for SAP Lumira integration, and our roll-out plans for the SERVER version. About which, more info can be found at the GA Announcement page:


99999 Lum_srv_plans.png



In this rainy sunday, being sent out of home because some serious tidy-up and cleaning going on inside home


i decided i will find somewhere to grab a coffee, listen to some awesome music and explore possibilities with R programming.


Since i am not very proficient in coding, i will look for some ready code on the internet and will try to adapt it.


I believe some social media content will really make my demos and presentations shine, so let's see what we can do with Facebook API.


Result component link is at the bottom of this page, ready to be used,


What we want to achieve? See this viz.




First Step :


Log on to FB, visit this page, click on "get access token" (dont forget to authorize for friends data) :




ScreenHunter_326 Mar. 09 14.19.jpg



Here's the github link to the original code i found online. it gets your friend list and their friends to plot a network cluster to visualize their connections to each other. Will be intresting to try:




We have to wrap this inside a function, add a print to actually plot the graph, make the api key a variable so we can pass from the algorithm properties page.


ScreenHunter_327 Mar. 09 14.26.jpg


Now let's configure our key and run the component :


ScreenHunter_330 Mar. 09 14.35.jpg

ScreenHunter_328 Mar. 09 14.30.jpg


Run the component and...!!!


ScreenHunter_329 Mar. 09 14.32.jpg


The code basically plots the clusters, and creates a legend to show names of people right on the center of each cluster. You chan tweak the code to change plot parameters which will probably make it look visually more appealing, like this  :


ScreenHunter_332 Mar. 09 15.12.jpg



Looking at the plot and names which i didnt include in the screenshot, i understand that they are mainly :


1) Work network

2) University friends

3) High-school friends,

4) Elementary school friends (yes we did find each other via facebook

5) Family


I believe that's a simple enough example of what clustering is, understanding different subcategories amongst a big list.


My next aim will be to replicate something similar but using a "product page" from facebook, visualizing people who "liked" the page. Any help is highly apreciated


Here's the re-usable component for SAP Predictive Analysis, download it and paste it into the apropriate folder which may look like


C:\Users\"usernamecomeshere"\SAP Predictive Components\RScript


Please note that the code is provided "as-is" and is not supported by SAP.



Happy coding



In an environment where you are using SAP Predictive Analysis together with SAP HANA integrated with an R Server you might not always have OS access to the R server and can therefore not see which R packages are installed. This impact the use of SAP Predictive Analysis when using SAP HANA Online or "Connect to SAP HANA" connectivity and the built-in R algorithms or custom R algorithms.


If you build a SAP Predictive Analysis Custom R component using SAP HANA Offline or other local files and the required package (algorithm) is installed on the your local R laptop it will normally work. However to get it working using SAP HANA Online the algorithms also needs to be installed on the R server integrated with SAP HANA. In a hosted environment you might not be able to get direct access to the R server to check which algorithms are installed.


Below is a step-by-step description on how to see which packages are installed on the R server integrated with SAP HANA using SAP HANA Studio.


From SAP HANA Studio select the "SQL" to open a SQL console for the SAP HANA system which is connected to the R server.

HANA Studio.png

Type the following script. Replace "DATA" with your own schema.

R Script.png

Execute the script (mark the script and click F8).


The results from running the script. The below result might differ from your depending on the installed packages on your R Server.



If you would like to get more information on the packages installed on the R server these are the additional parameters.

"Package"               "LibPath"               "Version"               "Priority"            

"Depends"               "Imports"               "LinkingTo"             "Suggests"            

"Enhances"              "License"               "License_is_FOSS"       "License_restricts_use"

"OS_type"               "MD5sum"                "NeedsCompilation"      "Built" 




Best regards,


Kurt Holst



Here is the script if you wish to copy and paste:



DROP TABLE "InstalledPackages";
CREATE COLUMN TABLE "InstalledPackages"
"Package" VARCHAR(100), "LibPath" VARCHAR(100), "Version" VARCHAR(100), "Depends" VARCHAR(100)
) ;

CREATE PROCEDURE RSCRIPT(OUT result "InstalledPackages")

result <- data.frame(installed.packages())
result1 <- (as.character(result$Package))
result2 <- (as.character(result$LibPath))
result3 <- (as.character(result$Version))
result4 <- (as.character(result$Depends))
result <- data.frame(Package=result1,LibPath=result2, Version=result3, Depends=result4)

SELECT * FROM "InstalledPackages";

Completing the telecom analytic using SAP PA, I used a different data set for doing Association Analysis and Forecasting.

For Association Analysis I used the prepaid dataset which had subscriber plans. For compliance sake I changed the names of the plans.

Association analysis, sometimes also referred to as market basket analysis or affinity analysis, being used to find the strongest product purchase associations or combinations. Here the prepaid customer data was used identifying the prepaid recharges they have done over last 6 months. The idea was to find the association between recharge patterns. I had to create a custom R component to delete duplicate data and get unique data as Association Analysis needs to be done on unique data only.

Using the Apriori algorithm in SAP PA it was found that users buying the 3G Monthly plan and SMS plan frequently opted for a Net 49 plan. Hence these 3 plans

can be combined in the future to create one single comprehensive plan. Another thing that can be noticed here is that people option for corporate plan along with

video calling facility rarely used the 3G quarter plan. While people opting for Corporate along with 3G quarterly opted for Video Calling. This certainly indicates some

sort of pricing issues in the plan that can be sorted.


Forecasting Subscriber Base using Winter's method:

Winters method sometimes also referred to as triple exponential smoothing, was used to forecast the future subscriber base for the company using SAP PA. The historical data of subscriber base from 1985 to 2010 was used to predict the foretasted subscribers.

The green line indicates predicated values of subscribers. While the blue bars represents actual number of subscribers. The prediction vs.

actual for the years 1985 to 2010 shows that the analysis is pretty close to actual. It also forecasts the future demand 2011 onward. By

looking at the graph we can say that the model we designed is pretty good, but how do we ensure quantitatively that the model is actually worth.

For this we need to refer to the “Algorithm Summary” window in the Predicted output. The Goodness of fit is 0.94 means that the fit

explains ~94% of the total variation in the data about the average indicating that the model is indeed a good one.



Big data and predictive analytics are great for preventative maintenance, and are an incredibly powerful platform for the future of healthcare.


The SAP HANA powered mMR Predictive Analysis App already monitors patients real-time vital signs and generates an alert when abnormal patient readings signal a possible emergency situation.


Can something like this be far away?


This was an open webcast to everyone last Monday.  Clarissa Dold covers what is new here What's New with SAP Predictive Analysis SP14?


Please note that the usual legal disclaimer applies that anything in the future is subject to change.


Figure 1: Source: SAP


Figure 1 describes what is predictive analytics, analysis to "support predictions".




Figure 2: Source: SAP


Figure 2 covers where Predictive Analysis is used.  For me forecasting and trends are the most popular, but for sure businesses look at churn and turnover.



Figure 3: Source: SAP


Figure 3 covers R integration, PAL, Predictive Analysis and SAP Infinite Insight (new name for KXEN).  These are four separate items and I think this SAP Press book gives a very good overview of PAL and Predictive Analysis.



Figure 4: Source: SAP


The speaker provided a demo of SAP Predictive Analysis.  One of the upcoming features is the ability to share Predictive Storyboards to Lumira Server (and my take only - will help mobilize your predictive stories).


Figure 5: Source: SAP


Figure 5 was part of the demo, showing that once you set up your predictive model, you could export it.  What is good about that?  You can save the settings in the model and then reuse it.


Figure 6: Source: SAP


Figure 6 is a summary of features in the current version of Predictive 1.14.


Figure 7: Source: SAP


Figure 7 is the high level architecture for Predictive Analysis.  You can see how it uses SAP Lumira


The top of Figure 7 reflects the menu in Predictive Analysis.  At the bottom, you can import/configure R packages, which I found very simple in the current version.



Figure 8: Source: SAP


Infinite Insight is new name for KXEN.  The KXEN algorithms will be part of Predictive (planned).  Future direction is the convergence of Predictive Analysis and SAP Infinite Insight.  Future direction is typically 12-18 months out in the future.  Interesting too that Predictive Consumption and scoring is planned for the cloud.

Question and Answer


Q: When we connect live to HANA not getting the formula option

A: Data manipulation not supported HANA online

Planned for later in 2014

Option is to build a HANA Analytical view


Q: When will Infinite insight be included in PA?

A: Starting in Q2



In March there are Predictive sessions at BI2014


At ASUG Annual Conference in June we have a very special session planned covering Predictive Analysis - TBA - to be announced.

SAP Predictive Analysis 1.14 Technical Overview was on January 27th  with Ashok Kumar (Predictive Analysis Product Owner).

Listen to the recording on whats new in SP14 from Ashok!





Note: You will need an S-user id.

Gone are those days when we used to have a list of items decided before we go to any grocery merchant and ask him to give the items. These days we go to a super market or any provisional multi-brand stores.  We have list of our own items to buy and the storekeeper has his own list of items which he has to sell us. At the end, if we see what we have bought would be more than actually we had planned. Because we are in the world, where all our activities are tracked. Whatever we buy the history is maintained by the stores. Based on this history of purchase the store keeper or the multi-brand companies do their analysis and predict our interest. They go to an extent and even predict what would be our next purchase; this is called as sentimental analysis. This sentimental analysis is more evident in the online market. From the moment we surf for something over the internet and look for something of our interest, all the page visits of ours will be tracked and would be studied. We would soon start receiving promotional email notifications based on our interest; or rather the search would be refined based on our last login and research over the internet for any product. This is all possible with predictive analytics technology.

                                SAP also has its own predictive analytics tool called SAP PA(Predictive analytics). SAP has acquired KXEN Company who were the leaders in predictive analytics. With acquisition of KXEN, SAP has strengthened its name in the predictive analytics space. SAP PA would be front ending with KXEN at the backend, which has capabilities of acquiring algorithms from different languages and tools.

                                For any predictive analytics tools there is requirement to constantly train your algorithm with the data. There are chances of your predictive algorithm giving  result which is 98% to actual data/happening, at the same time there are also chances where in your algorithm might give a result which is close to 50%. This is because the data on which the algorithm is run is totally new. In such cases the predictive results would be very bad, that’s the reason there is constant need of training and tuning the algorithm.

                                With SAP PA, one can integrate the data from SAP HANA and study the predictions. You can also integrate the algorithms written in R and other statistical languages and  club it into SAP PA. Integrate SAP HANA PAL etc.





Hi everyone, are you interested in hearing from Ashok Kumar KN (Product owner of Predictive Analysis) on the new features of SP14?

Join the webinar on January 27th at 8-9am PST.


URL: https://sap.emea.pgiconnect.com/pra

Duration: ~ 1 hour


Conference Access Code 111166


Phone details here:






Phone Type


















0800 444 6500












1800 041 794




































+61(0)3 900 15576












+61 (0)2 8223 9725
























0800 8866 3197












+43 (0)2 68220 59201












0800 39673












+32 (0)2 404 0656












+32 (0)2 894 8010









Sao Paulo



+55 11 3351 7212












00800 118 4443












+359 (0)2 491 73 63












1877 278 6262












1877 279 1164












+1 514 315 7910












+1 416 849 5517












123 0020 3893












10800 712 2735












10800 120 2758












+86 1059 044 829












01 800 915 7427












0800 777 910












800 96484
























800 701 360












80 88 83 88












80 88 73 76












+45 32 71 16 47












8000 111 523












+372 616 0379












+358 (0)9 231 94022












+358 (0)9 7251 9622
























0805 630 297












0800 941 577












+33 (0)3 59 36 11 81












+33 (0)4 26 69 94 98












+33 (0)4 26 84 04 46












+33 (0)1 70 70 17 76
























01801 003817












0800 664 9302












0800 100 0631












0800 000 4316












0800 589 1848












+49 (0)69 71044 5417












+49 (0)69 2222 10763












00800 128 570
























800 905 746












+852 3009 5031












068 001 9388












+36 1778 9282












800 9905












000 800 100 7669












+91 (0)80 6127 5047












+91 (0)22 6150 1314












1890 944 960












1800 937 889












+353 (0)21 2363965












+353 (0)1 4311460












1809 212 920









Tel Aviv















800 977 833












+39 02 3041 3101












+39 02 3600 9838












+39 06 8750 0872












0066 3313 2624












+81 (0)3 5767 4643
























+371 6778 3963












8800 31312
























800 27095












+352 2088 0358












1800 814 489












+60 (0)3 7712 4599












001 877 319 9733












0800 265 8455












+31 (0)20 794 7893












+31 (0)20 716 8290
























0800 452 317
























800 69909












800 17131
























+47 2415 9507












00 800 121 4054
























800 784 423












+351 2136 65018












0800 895 808




































810 800 2901 4011












+7 495 213 0975



Saudi Arabia









800 844 3782












800 120 5800
























+65 6622 1049












0800 001 824
























0800 80151












+386 1600 3142



South Africa









0800 983 949















South Korea









+82 (0)2 3483 1494












900 800 488












+34 91 788 9985












+34 91 114 6577












0200 125 421












0200 883 434












+46 (0)8 5352 6409












+46 (0)8 5033 6513












0800 740 342












+41 (0)43 456 9320












+41 (0)44 580 1022












00 801 136 287












+886 (0)22 656 7295












001 800 120 667 026












8000 440 631












0800 358 4409












0800 279 4827












+44 (0)20 7153 0963












(0)20 7111 1247












+44 (0)20 3364 5638












0800 500 297












1877 278 6262












1877 279 1164









New York



+1 646 254 3353









New York



+1 646 254 3354









New York



+1 718 354 1440



Last month, I hosted a “Big Data Hangout” featuring SAP’s Shekhar Iyer, Global VP Analytics & Big Data and David Ginsberg, Chief Data Scientist.

The hour-long discussion covered different topics about data science and data scientists, including questions from the audience:

  • What is a data scientist and what do they do?
  • Why now?
  • What kinds of business challenges does Data Science address?
  • Example use cases
  • What kinds of skills are needed?
  • What SAP technologies are we talking about?
  • Typical barriers to success?
  • What’s going to happen in the future?


You can find out more about SAP’s Data Science at http://www.sapbigdata.com/ – including the Data Science services available through David Ginsberg’s team


You can view the whole session above, or view the YouTube playlist to go directly to a topic of interest.


Missed the previous Big Data Hangout sessions? See them here:

Bimal Mehta

Telecom Analytics - Part1

Posted by Bimal Mehta Dec 20, 2013

Recently I got a chance to do a small demo in SAP PA for telecom. The initial part of getting correct data was the biggest challenge. However I found a University site that had some data sets. I requested them for one data set and they obliged.

The Center for Customer Relationship Management at Duke University

The dataset that I used was  from Duke/NCR Teradata 2003 Tournament (I know quite old but served the purpose for demo).

The data was solicited from a major wireless telecom to provide customer level data for an international modeling competition.Data was suitable for churn modeling and prediction.Data were provided for 100,000 customers with at least 6 months of service history. However for demo I used a sample of 3400 records only.

Historical information was provided in the form of

    • Type and price of current handset
    • Total revenue (expenditure)
    • Call behavior statistics (type, number, duration, totals, etc.)
    • Demographic and geographical information

Below is the snapshot of the model that I built in SAP PA. I will not talk about the process of building the models but will focus on the results. For any information on building the model you can put your query in comments section.

Some of the key outputs and their interpretations are shown below.



Below are the results of different analysis that I ran on the data set:


1. Clustering

Customers were clustered into 5 different clusters based on Total Monthly Calls, Night Billing, Day Billing, Total Revenue and Number of cars with the customer

It was observed that 4 clusters were high density clusters.  Cluster 5 was the most dense with 1629 of total 3400 customers falling in that cluster. On analysis it was found that this cluster of customers had high income and were generating higher revenue. Cluster 5 was consisting of high income customers that were generating average revenue (possible campaign customers) while cluster 4 had customer with low income and very high call usage (possible fraudsters). The tree map gives us the state-wise break up for the same.




2. Decision Tree

A decision tree using the R-CNR tree algorithm was created to study the existing churn in the telecom dataset.

The chart represents the chances of churn based on several factors like Day charge, Evening charge, Net usage, Handset price etc. This type of chart is called a decision tree. The decision tree is a special tool for classification in DM systems. R CNR Tree Method was used for analysis in the titanic scenario. The Generation was based on the top-down principle. The Starting point (the root) contains all records of the training set which is divided with the aid of the rules defined by the variables in two or more sub-nodes (sons / daughters).

By analyzing the solution of the diagram we get various profiles of customer who have a high probability of churn :

Eg : Customers with Day Usage charges (for a fortnight) of > $44 are more likely to churn out. Similarly it was also found that people living in small and medium

households had a high churn probability than those living in luxury apartments. Customers having handsets with high price churned 60% more than those with

low price handsets indicating that they were only interested in handsets.




3. Churn Prediction using Neural Network

The neural network algorithm is a machine learning algorithm that can predict a dependent variable based on several independent variables and the historical association between the two. In our case we try to predict whether a future customer will churn or not based on the historical analysis of customers who have churned. The number of Hidden Layer Neurons was 5. The iterations were set to 1000. The results obtained were pretty good with the variance being <15% when compared to original values. The graph below shows predicted churn for each of the states in USA.

A quick glance shows that New Mexico has the highest % of predicted churn followed by Wyoming and California. The lowest churn is predicted in Alabama at

around 14%. Certainly the life style of customer and work location is an important factor as we see.


5. Outliers

The outlier algorithm can be mainly used to identify an error in dataset or fraud detection or may be an out performer. The idea here was to find out the outliers in each region based the various parameters of the customer in the dataset.


229 customers were found to be outliers. The region wise break-up of outliers is shown below against the property type of the customer fixed to City (Customers residing in Cities only).

The mean monthly usage (total calls including net) of 3333 customers is 21428 minutes. Based on that the upper fence was decided at 62967 minutes. 229 customers were using the service above that. Hence we know that these 229 customers can be either fraud or valuable. Further detailed analysis of these 229 customers can help us arrive at a solution to tag them.





Filter Blog

By author:
By date:
By tag: