1 2 3 7 Previous Next

SAP Predictive Analysis

98 Posts

Hadoop and Predictive Analytics are some of the most exciting technologies for businesses today but are often seen as having a steep learning curve. While both are complex, getting started is simple thanks to the Hortonworks Sandbox providing the database and SAP InfiniteInsight making predictive analytics intuitive for both data scientists and business users.  In just 3 easy steps, you can setup your own Hadoop cluster and tackle real predictive use cases!


1. Install

First you’ll need to install 3 components:

1. VirtualBox for the virtualization environment: https://www.virtualbox.org/wiki/Downloads

2. HortonWorks Sandbox with HDP 2.2 image: http://hortonworks.com/products/hortonworks-sandbox/ [Go to ‘Download & Install’ tab and select either Mac or Windows for VirtualBox]

3. SAP InfiniteInsight 7.0: http://bit.ly/1t77brW [Trial]


Once you’ve installed Virtualbox, open up the Hortonworks Sandbox .ova file and it’ll automatically load it into your interface. Hit ‘Start’ and you now have a fully functional Hadoop environment!





2. Connect

Next we simply set up our connection from Hadoop to SAP InfiniteInsight using an ODBC connection. Download and install the driver here: http://hortonworks.com/hdp/addons/.


After installation, open up your ODBC Administrator and under the System DSN tab, “Sample Hortonworks Hive DSN” is now available.



Configure it with the IP address from the startup screen of your Hadoop environment, with the remaining fields shown below.



Test the connection and you have now successfully added Hadoop as a data source for InfiniteInsight.


TIP: Your <ip address>:8888 will be your homepage for Hadoop in your browser for accessing Hive, HDFS, and more


3. Predict

Now that everything is set up, you’re ready to do predictive analytics! Open InfiniteInsight and we’ll ‘Create a Clustering Model’ based on the sample tables in Hadoop. Select the ‘Data Type’ as ‘Database’ and select “default”.sample_07 that shows various job titles with the number of total employees and salaries.


TIP: Check out this great tutorial for uploading your own datasets into Hadoop: http://hortonworks.com/hadoop-tutorial/loading-data-into-the-hortonworks-sandbox/



On the next screen, hit the ‘Analyze’ icon and continue with ‘Next’ and ‘Generate’ leaving the default settings and voila, we’ve done it!



We’ve set up our Hadoop environment and performed a clustering analysis on the fly with SAP InfiniteInsight in 3 easy steps. Give it a spin and please leave any feedback below.

Hi Everyone,


We now have a customer-facing website for suggesting product enhancements which is dedicated to SAP Predictive Analytics ( a.k.a. InfiniteInsight and Predictive Analysis) : https://ideas.sap.com/PredictiveAnalytics


The Product Management team is looking forward to your suggestions!


Kind regards,



Some users might have encountered a situation when a R script runs well in the R console or RStudio but throws an error in PA about not being able to find a R package that is already installed to R. For example, the screen shot below was an error I had. It is thrown out by PA custom R component when loading the rgl package


3dplot throws error in PA.PNG


When you get this error, one potential problem could be that the R_LIBS environment variable was not successfully set. In general, R_LIBS need to points to all folders where your R package are installed. Use the .libPaths() function to check where your R libraries are installed


calling libPaths function.PNG

In my machine R packages are installed to two folders as shown above. One potential solution is to append all folders returned by .libPaths() to the system environment variable R_LIBS. Make sure to separate them by semicolon (;). Create R_LIBS if it does not exist.

Following on from Part1, I now want to show you how we can easily do some further analysis of the Basket Analysis Output by using SAP HANA Studio, SAP Predictive Analysis (PA) and/or SAP Lumira.


The output of the Basket Analysis (HANA Apriori algorithm) is straightforward: It shows the product purchased (PreRule), the secondary product(s) (PostRule), and then some calculated fields - Lift, Support and Confidence. Wikipedia explains exactly how they are calculated.  We can choose if we are looking for a match between 2 products or more than 2 by using Aprori or Apriori Lite


I have found that the lift is the most useful of these columns as it combines the support and confidence to give you an idea of how good the rule is.  For example does it occur frequently and if one product is found how likely is it that the other product will be in the same basket.


First in SAP HANA studio we can easily see this output and build a simple Analytic View (AV) for us to consume in PA/Lumira.

The Analytical View allows us to derive a new field and then specify some metadata such as the measure columns.

Here's the output table definition we created in Part1


And here's the data that holds


We can build a simple Analytical View with just this table


I have found it useful to combine the PreRule and PostRule into a single field, so we will do this with a calculated column as below.

Calculated Column.png

Now we do need to define the Aggregation for the measures, and there is not really any of them that are appropriate, but as we will also include the Rule or the PreRule and PostRule it will work fine with SUM.


We can now use Lumira / Predictive Analysis to create some further visualisations using this Analytical View.

See Below for a selection that I have created in just a few clicks.

Basket Analysis Heat Map.png

Basket Analysis Tree Map .png

Basket Analysis Tag Cloud.png

Basket Analysis Network Map.png

Basket Analysis Table Output v2.png

I have written another blog looking at The SAP HANA Effect,  looking at how this changes the business process and analysis of basket data  This can be found in the SAP HANA and In-Memory forum.  I also am looking to compare the different HANA algorithms and see how the results and performance compare.


Market Basket Analysis or Basket Analysis for short is one of those things that people have talked about forever and if you search online, you'll find examples from the 80s and the early 90s.  Things to note Basket Analysis should really be classed as Advanced Analytics, as on it is own it's not really Predictive Analytics, as you are not predicting anything it more data mining, but all 3 terms are often used interchangeably.  What a Predictive tool or the Predictive Algorithm gives you is easy access to the ability to perform this type of analysis without having to be a SQL Guru, Propeller Head or even a Data Scientist.

Basket Analysis is something that has been possible for a long time, but the ability to do this quickly and easily on a large set of data is where many hit problems. With this in mind I wanted to share a couple of days work that I have done recently to improve this experience.

HANA Algorithms
With SAP HANA SP9 we now have 4 algorithms that can be focused on
Association Analysis or Market Basket Analysis as it is more often known as.

  • Apriori
  • Apriori Lite
  • FP-Growth
  • KORD

I'm not going to get into the technical details around the differences between these, the HANA PAL Documentation provides more details of each.  I plan to write a follow up blog that will aim to explore any differences between these.

What I have done is used SAP Predictive Analysis v1.21, with SAP HANA Revision 85. I have performed this on an 80+ million record data set, some may call this Big Data, others may not.  The data required for this task is straightforward, you just need 2 columns, the item or items purchased and the transaction number.
Source Data.png

Normally with SAP HANA you may build an Analytical View to invoke the HANA OLAP engine, however OLAP is typically about Aggregating data and being able to analyse this by slicing, dicing and drilling into this data set.  For Basket Analysis the opposite is true you want to feed in the base transactions so that you can see exactly what people purchased in a single basket.

You can then feed this into SAP Predictive Analysis (PA), if you want to use the in-built HANA Predictive libraries you need to "Connect to SAP HANA"

Connect to HANA Full.png

Select your source data

HANA Source Data.png

Drag in the HANA Apriori node
Apriori Node in PA.png

Configure the node parameters, which are fairly self explanatory.  Depending upon the basket size in your data-set you may need to set the support quite low.

Actual Apriori Parameters.png

I have then chosen to output the resulting rules to a new HANA table so that we can easily analyse the rules generated.

HANA Writer Node.png

You can then run the analysis and receive the output in just a couple of minutes.

Execution Status.png

If you switch to the results view you then receive some pre-built analysis showing you the rules that have been generated

PA Results v2.png

Predictive Analysis Association Chart - Tag Cloud

PA Tag Cloud v2.png

As you can see you the output is fairly readable, but you still may want to further analyse the rules generated to understand what this is telling you.

I have now written some follow up blogs, Part 2 Visualisation of Results and another looking at the The SAP HANA Effect. In the future I also plan to explore the other HANA algorithms and compare their results and how they perform.

Predictive Analytics has recently seen a spike of excitement among many different business departments such as e.g. marketing or human resources who seek to better understand their customers or would like to look at how employees behave in their organization and improve the services offered to their clients. Unfortunately only very few business departments have access to Data Scientists and therefore often have only little experience in developing predictive models. This presents a real challenge since predictive analytics is fundamentally different from traditional reporting and without Data Science support you might find it hard to get started and feel confident in the results of your analyses. Luckily, SAP InfiniteInsight addresses this challenge directly and can be easily used by analysts since greatly reduces the complexity of data preparation and model estimation through a very high level of automation. This way you can focus on the business questions that matter and spend less time dealing with complicated IT solutions. This blog is geared towards analysts who want to understand how to get the most out of their data using SAP InfiniteInsight so here’s how you would get started with your predictive modeling initiative:



Step 0: Understand the predictive analytics process

Before actually getting started, you should familiarize yourself with the general idea behind predictive analytics and how it differs from traditional business intelligence (the folks over at Data Science Central have a nice summary). In short, when using predictive analytics we want to forecast the probability of a future event based on patterns that we find in historical data for said event: For example, to predict turnover (your target) we will need historical data on turnover along with a bunch of attributes that we can use to find relationships and patterns between the attribute and target variables. Once we have derived the historical relationship and built a valid model, we will use this model on new data to forecast turnover. The forecasted results can then be used to make various business decisions. Now, the actual flow may involve a few side steps (e.g. transforming your data so that it can be used) but in essence this is the high-level process that will be described here.



Step 1: Define your business objective

Whether it's wanting to predict which customer will buy your newly launched product or which employee might leave your companyyou need to define what your business objective is and clarify how you want to measure it. This sounds trivial but can provide a real challenge since you need to have historical data available for your target outcome that is sufficiently accurate to derive a statistical model in a later step not to speak of having your target variable available in the first place.



While it’s certainly possible to “just play around” and see what happens (sometimes referred to as exploratory analysis), you will gain better results if you focus your efforts on a single business question from the very beginning. You will also find it easier to gain end-user acceptance if you know what challenge your users are facing and how your analysis can help them solve it.



Step 2: Find & connect to the data

Depending on your business objective, you will now need to find the data to base your model on. You don’t need to have a sophisticated concept in mind but you’ll need a general idea what kind of data you are looking for – with SAP InfiniteInsight there is one simple rule: The more variables you have, the better since SAP InfiniteInsight will determine automatically which variables should be removed and which variables add value to the model. Getting the data from an operational system like SuccessFactors Employee Central or SAP CRM can be slightly more difficult than from a Business Warehouse but the granularity of data available in a BW may not be sufficient for modeling: With operational systems the data usually has the right granularity but is frequently distributed across many different tables and often companies restrict direct table access to users from IT. Therefore you may face some challenges when trying to get the data from the tables directly. BW on the other often has a wealth of data, nicely packaged and preprocessed but you may run into the issue that while the data may have all the attributes that you’re looking for, the data may be too aggregated to be used.


The rule of thumb for data granularity is: You need historical data in the same granularity as the concept you want to predict, i.e. if you want to forecast turnover on employee level you need to have the historical data on employee level as well. The good news is that you can always fall back on using a simple flat file with your data in SAP InfiniteInsight so if push comes to shove you can simply ask your IT department to download some data as CSV in the needed format.



Step 3: Derive & interpret the model

Once you have the data, you want to find the best model that has the best tradeoff between describing your training data and predicting new, unknown data: SAP InfiniteInsight can automatically test hundreds of different models at the same time and choose the one that works best for your data and purpose. Hidden in the background, SAP InfiniteInsight also performs many tasks automatically that Data Scientists usually do with traditional tools to improve the quality of your data and the model performance such as missing value handling, binning, data re-encoding, model cross-validation, etc. This way you can simply point SAP InfiniteInsight to your bucket of data, define which variable to predict and ask the tool to work its magic. All you need to do then is interpret the results (see this blog post to see how you can interpret a model based on a example from HR ).



Step 4: Apply the model

Great – now you have a working model! Next you want to predict new stuff with your model – usually this “stuff” sits somewhere in a database. SAP InfiniteInsight can either directly apply the model to new data (e.g. data that sits somewhere in a table or a flat file) or it can export the model to a database to allow real-time scoring. The first option is more for ad-hoc scoring or further model validation purposes while the second option can be used to continuously score new data as it comes into the database – this way one could include the scored results in some other application or make the information available to other users. However, in the case of in-database scoring you will probably need some involvement from your IT department.



Step 5: Execute on your insights

One of the most important questions of any statistical analysis is: What do you do with the results? How can you reap the benefits of “knowing the
future”? Having an idea about what is likely to happen is not enough – now your organizations need to adapt its behavior to either avoid the unpleasant
outcomes or gain the positive ones as predicted by the analysis. How this can be done depends heavily on your organization and the analysis context –
possible next steps include

  • making the results/model available to a larger audience (e.g. HR Business Partners, marketing managers, etc.) by exporting it to a database to enable real-time application of the model,
  • including the scoring algorithm in a business application (e.g. an SAP system like SAP CRM),
  • developing a one-time action plan based on the results, or
  • designing a larger process to use the analysis results in each cycle of the business process to which it belongs.


Remember to include those employees who are crucial for a successful execution (e.g. usually your business end-users) early in the process and make sure
they understand the results and how to leverage the insights. To be accepted, your analysis must be concise, clear, and trustworthy. Try to understand where
your stakeholders (e.g. managers, business users, etc.) are coming from and how to communicate the results of the analysis effectively in their business
language. A great analysis with great predictive power is only half the battle – whether your business will be able to profit from this will depend on your organization’s ability to close the loop to its operations.




At this point you may feel slightly overwhelmed at the sight of the different aspects that play a role when setting up a predictive analytics initiative. It is true – these things can get really complex but when using SAP InfiniteInsight they become much simpler compared to traditional tools due to the high level of automation. However, to get started quickly and get a feeling for the technology you don’t need to boil the ocean – you can easily take data that is already available to you and see what kind of relationships you can uncover (a trial for SAP InfiniteInsight is available here). You can use this blog post to see an example of how SAP InfiniteInsight can be used with HR data but the example and the steps described translate well to other business areas as well. Please feel free to leave any questions or comments!

Many HR departments are looking at predictive analytics as a hot new approach to improve their decision making and offer exciting new services to their business. Luckily, with SAP InfiniteInsight you don’t have to be a Data Scientist to find the valuable insights hidden in your data or build powerful predictive models. Combined with this, SuccessFactors Workforce Analytics provides clean, validated information bringing together disparate data from multiple systems into one place to enable decision making. Let’s see on a concrete example how you could use this combination to better understand your workforce and make predictions in areas that really matter to your business.



The Scenario

Meet John – he’s an HR analyst working for a large insurance company and responsible for supporting line of business managers with workforce insights. He’s been monitoring a concerning trend over the last year regarding the turnover of sales managers in the company’s regional offices – his turnover reports in Workforce Analytics have shown significant deviations from the tool’s industry benchmarks. Today, he has a call with Amelia, the global head of sales, to talk about headcount planning. John takes the opportunity to inform Amelia about his findings only to learn that Amelia has been made aware of this phenomenon a few weeks ago by a few of her direct reports: “You know, John – I’m fine with people leaving, a bit of turnover is healthy and keeps our business competitive but what I’ve been hearing is that we tend to lose the wrong people, namely mid-level sales managers with a great performance record. If an experienced sales employee leaves we take an immediate hit to our numbers so we naturally try very hard to keep them. Our salary is more than competitive and we offer great benefits so I have trouble imagining what could be the drivers behind this trend. Can you please investigate and let me know what I could do to reverse this development?”



The Data

John discusses his suspicions with some of the other analysts who have observed similar trends in other lines of business. Some of his colleagues hint that a lack of promotion or a general increase in the readiness to change jobs might have an influence on employees’ propensity to leave. So John decides to extend his analysis beyond sales and include other business functions as well. He prepares a dataset with all the employees in his company as of the end of his company’s last fiscal year (09/2013) and flags employees who have left the company voluntarily within the following 12 months (until 09/2014) to have a basis for his analysis. The dataset also contains a range of variable to assess their influence on turnover such as previous roles, demographics or performance. The 12 months period for tracking the employee will allow John to anticipate an employee at risk with sufficient lead time to give a manager the opportunity to react if required. Even though John has already some rough hypothesis what could drive turnover based on his reports in Workforce Analytics, he wants to keep the analysis broad to capture unexpected relationships as well.



The Analysis

John starts up SAP InfiniteInsight and decides to build a classification model to classify the employees in his dataset into those who would leave within the next 12 months and those who would still be with the company.


John connects to the SuccessFactors Workforce Analytics database and selects his dataset as a data source:

02-Select_Dataset - WFA.png

He clicks “Next” and instructs SAP InfiniteInsight to analyze the structure of his dataset by clicking on the “Analyze” button next.


John is happy with the suggest structure of the dataset – SAP InfiniteInsight has recognized all the fields in his dataset correctly and John doesn’t need to make any changes. He clicks “Next” to progress to the model definition screen:


John can use all the variables in his dataset except for the Employee ID since this field is perfectly correlated with the outcome John likes to model. Therefore he excludes Employee ID from the model definition. As target variable John uses the “Will leave within 12 months” flag from his dataset. This flag contains “Yes” for all employees who leave within 12 months and “No” for those who are still with the company. The analyst clicks “Next” to review the definition before executing the model generation:


Since John is no Data Scientist and doesn’t want to deal with manual optimization of the models, he uses SAP InfiniteInsight’s “Auto-selection” feature: When “Enable Auto-selection” is switched on (by default), SAP InfiniteInsight will generate multiple models with different combinations of the explanatory variables that John has selected in the previous screen. This way the tool optimizes the resulting model in regards to predictive power and model robustness (i.e. generalizability to unknown data). Simply put: When using this feature John will get the best model without having to deal with the details of the statistical estimation process. He now clicks “Generate” to start the model estimation process.



The Results

Eight seconds later, SAP InfiniteInsight presents John the results of the model training:


John reviews the results: His dataset had 19,115 records and 22 dimensions were selected for analysis. 9.02% of all employees inside the historical dataset (snapshot of 09/2013) left the company voluntarily between 10/2013 and 09/2014, i.e. within 12 months of the snapshot (=his target population), while 90.98% of employees were still employed. These descriptive results are in line with his turnover reports from Workforce Analytics.


John now looks at the model performance (highlighted in red) and sees that the best model that SAP InfiniteInsight has chosen has very good Predictive Power (KI = 0.8368 , on a scale from 0 to 1 with 1 being a perfect model) as well as extremely high robustness (Prediction Confidence: KR = 0.9870, on a scale from 0 to 1). Also, from the 22 variables John had originally selected, the best model only needs 16 variables: The remaining six variables didn’t offer enough value and have therefore been automatically discarded. Based on the model’s KI and KR values John concludes that not only does the model perform very well on his dataset – it also can be applied to new data without losing its predictive power. He is very happy with the results and clicks “Next” to progress to the detailed model debriefing.


John decides to look at the model’s gain chart to understand how much value his model offers for classifying flight risk employees compared to picking employees at random (i.e. not using any model at all). So he selects “Model Graphs”…


The graph compares the effectiveness of John’s model (blue line) at identifying flight risk employees with picking employees at random (red line) as well as having perfect knowledge of who would be leaving (green line). Since the model’s gain (blue line) is very close to the perfect model (green line) John concludes that there is probably only very little that could be done to further improve the model since it is already very close to perfection (for more information on how to read gain charts see here). The analyst decides it’s worth looking at the individual model components to understand which variables drive employee turnover. He clicks on “Previous” and selects “Contribution by Variables” on the “Using the Model” screen.09-Variable_Contributions.png

John looks at the chart and can see that the top three variables contributing to voluntary turnover are “JobLevelChangeType”, “Current Functional Area” and “Change in Performance Rating”. He decides to look at them in more detail by double-clicking on the bar representing each variable.


The most important variable is “JobLevelChangeType” which describes how an employee got into his or her current position: The higher the bar, the greater the likelihood to leave within the next 12 months. John sees directly that being an external hire or having been demoted contributes significantly to turnover. He isn’t surprised to see “demotion” as a strong driver since his company had only three years before begun using this approach to make the organization more permeable in both directions and this has seen some resistance by employees. Based on the data, it seems that having been demoted drastically reduced employee retention.


Also, external hires seem to rather leave the company as opposed to looking at better opportunities within the company and John makes a note about this – he wants to discuss this with Amelia since he currently doesn’t see why external hires would behave this way.


Next, John looks at “Current Functional Area”:


John immediately sees his suspicions confirmed: Working in sales contributed significantly to employee turnover – and this by a wide margin! He continues to the third variable “Change in Performance Rating”:


The pattern John had observed in the first two variables continues – seeing one’s performance level decrease drove employees away while improving oneself helped the company retain employees. The company has introduced a stack ranking system where performance levels were always evaluated in relation to an employee’s peers to encourage grow and competition – especially in the sales department. However, as a consequence many employees see their performance decrease (12.8% of employees have experienced this during the period) while there may not necessarily be something wrong with an employee’s absolute performance: A previously high performing employee may see his or her performance rating decrease while delivering the same results simply because he/she is part of a high performing team where some of the other team members had a better year. The results of the model hint at an unintended side-effect of this system – instead of putting up with decreasing performance ratings and training harder, the company’s employees tend to quit their jobs and try their luck elsewhere. John finds this interesting and plans to discuss this with Amelia to understand whether these effects were welcome in her department.


John looks at the remaining 13 variables to understand the other drivers better. He observes a strong influence of tenure on turnover levels (especially among mid-level employees with tenure between 5 and 9 years) or not having had a promotion within the last three years. There also seem to be differences across countries, regions and demographic variables such as age or gender. The patterns that John sees in the model paint the picture that the company has indeed a problem keeping experienced employees, especially in the sales department – and the culprit seems to be new stack ranking performance evaluation scheme John’s company had implemented three years ago in an attempt to foster a more competitive and performance oriented company culture. This is supported by the data from the countries – those few countries where the stack ranking system hadn’t been implemented yet have significantly lower turnover. The story that emerges is one of an experienced, well-performing employee who is confronted with the new performance evaluation scheme, sees his or her performance ratings drop with pressures on the rise and then decides to leave.


John assembles the information into a presentation for his HR top management to address the topic. After having had a follow-up discussion with Amelia who confirmed his conclusions, he is convinced that the stack ranking system is not tuned to the volatile sales business and serves as a driver of turnover. In preparation of the meeting John decides to apply his model on current data to identify those employees from the sales department who are currently at risk of leaving.


The Prediction

John refreshes his dataset based on the most current data. Using the model’s confusion matrix John chooses a high sensitivity level to predict potential leavers. The confusion matrix compares the model's performance in classifying employees into leavers and non-leavers (=”predicted yes” / “predicted no”) against the actual, historical data (=”true yes” / “true no”). This way John can understand how well the model performs at classifying individual employees into leavers and non-leavers – every model makes mistakes but good models make fewer mistakes than bad models and the confusion matrix tells John which categories the model confuses with one another compared to the actual outcomes (hence the name “confusion matrix” – more info here).


Using this model on the list of sales reps should give John a list of employees of which statistically 56.72% (the model’s sensitivity score) would actually leave the company within the next 12 months. John applies the model on his new dataset:


After applying the model, John looks at the resulting list: Out of 2,120 employees, his model has identified 473 employees at risk out of which he knows about 57% will actually leave within the next year (although he doesn’t know who exactly will be leaving). Since some of these employees perform better than others and are therefore more important to be retained, John filters the list of flight risk employees to only include experienced, well performing sales reps and ends up with a shortlist of 215 employees. From these employees’ sales data in Workforce Analytics he calculates that losing 57% of then could cost the company up to $60M in lost sales. Also, at estimated recruiting and training costs of a new sales manager of 150,000$+ this analysis could save the company up to 215 x 57% x $150,000 + $60M in lost sales = $78.3M.



John discusses the list of 215 employees with Amelia and they decide to go to the HR Leadership Team meeting together to address the urgency of finding appropriate measures to retain these employees. Amelia and the HR Leadership Team are very impressed with John’s work and, faced with the huge impact of doing nothing, decide to free up some budget for appropriate retention measures while at the same time initiating a discussion whether to get rid of the stack ranking evaluation system to reverse the trend…




...and how are YOUR employees?

Employee retention is an important topic with a big impact on a company’s bottom line. Seeing how simple it is to use SAP InfiniteInsight maybe you’d like to try out a similar analysis yourself? A trial version of SAP InfiniteInsight is available here:




Have any other great ideas around using predictive with HR data? Feel free to post your ideas or questions in the comments!

Here in this blog I have tried to consolidate all the information regarding SAP Predictive Analysis under one umbrella.  Main aim is to bring and assimilate the information relevant for SAP predictive analysis, say from system setup, to executing predictive algorithms, even for the beginners. I have also tried to retrieve information from other blogs as well.


SAP Predictive Analysis falls namely under two categories: Predictive Analysis Library and SAP Infinite Insight.

We can make use of Predictive Analysis Library (PAL) in mainly two ways:

  • Using HANA PAL libraries directly from HANA studio or
  • Using SAP Lumira Predictive Analysis Tool




This is where it started. Once you get access to a HANA system (I hope you already have HANA studio in your system, if not please install HANA Studio), you cannot directly start working on PAL algorithms, there are certain prerequisites which you have to do to check whether HANA system is capable of executing PAL algorithms.


PAL Libraries are available from SAP HANA SPS06 onwards, but you can always go for the latest one, if it is available. With every upgrade, HANA team have tried to bring in lot of updates and features. HANA SPS08 has around 50+ PAL algorithms available. Basically PAL defines functions that can be called from within SQL Script procedures to perform analytic algorithms.


One can check if PAL libraries are successfully installed in your system or not by executing the following SQL statements in the SQL console.






You will not see any results if PAL libraries are not installed in the HANA system which you are working on. You can contact your system administrator if the libraries are not installed or if you have system administrator access, you can follow the steps mentioned in this blog PAL Libraries setting up on HANA System.


Once this is done, then you need to give privileges to your user for executing PAL library functions. This can be done by executing the following statement:


GRANT EXECUTE ON system.afl_wrapper_generator to I068235;

GRANT EXECUTE ON system.afl_wrapper_eraser to I068235;


Here I have given my user name, but you cannot grant this privilege by logging in with your user (This is very important, you cannot give any kind of privileges to your own user after logging in with your user, always try to give privileges from a different user).

Once this is also done you are good to go.


You can check the SAP PAL Documentation for detailed description on PAL Alogrithms. All the PAL algorithms are explained with use case in this document.


This link always refers to the latest document regarding SAP HANA Predictive Analysis Library, and it will include the features of latest productive version of HANA.


You can see the examples of all the algorithms in the above mentioned document. It is very well explained. Only thing is that one has to select the best possible algorithm based on your use case and scenario. PAL Libraries/Algorithms are divided into 9 data mining categories. One frequently used category is Time Series algorithms. If you want to forecast any new values, these are the best algorithms available. There are five different algorithms under Time Series category.


E.g.: Double exponential smoothing.


You can watch the video on Double Exponential Time Series to have a better understanding about time series algorithms. This video has clearly explained the steps which you have to follow when you are working on PAL Time series algorithms. Similarly you can see the videos for other time series algorithms as well.

SAP Predictive Analysis and SAP Lumira


To avoid any confusion, SAP predictive analysis tool is altogether a different installation from SAP Lumira. If you have already installed SAP Lumira, you will have to uninstall SAP Lumira to install SAP predictive analysis.


You can download SAP predictive analysis from Service Market Place.


You need special privileges to download any software from service market place, normally most people don’t have it. You can ask for permission from the same page itself, it will go to your direct reporting manager for approval.


Once you install the tool, it will be a 30 day trial, license will get expired after 30 days.


You can watch the video on SAP predictive analysis tool setup to have an idea about installation and setting up of SAP predictive analysis tool.


In this video they have covered connecting to a HANA system from Predictive Analysis tool as well. While trying to connect to the HANA systems, try to give SAP HANA server as: lddb<system ID>.wdf.sap.corp


Once it is connected you can directly pull data from the tables as mentioned in the video.


If you have already tried out Double Exponential Time series from HANA studio, the next steps will be easy.


You can drag and drop the algorithms which you want in the predict tab. The screen will look like this now.


PAL Algorithms.png

Once this is done, you can change the properties of the algorithm by clicking on the settings button on the selected icon. Then select configure settings, and then a screen will come where in you have to enter all the mandatory values. See the screen shot below to see an example.


PAL Tool Properties.png


Once the configuration is done, run the algorithm from the same screen. You will see the results in a table, and if you go to Trend chart you will be to see the predicted values in a graph like below screen.


Double Smooth.png


Working with this tool is this easy, and once we get the results from the algorithm you have the option to write the data back to the HANA DB as well.


Most of the PAL algorithms available in HANA systems are available in SAP Predictive Analysis Tool except few. Selecting a particular algorithm is as easy as drag and drop. You don’t even have the additional overhead of creating signature tables for calling a PAL algorithm. (You will come across signature tables if you try to call any PAL algorithm from HANA Studio, we even have to create a result table which will store the result data once we successfully execute the algorithm). Here in this tool, everything can be maintained as properties for whatever algorithm which you have selected. Once you execute the algorithm the results can be displayed directly on a graph.


SAP Infinite Insight


SAP has bought KXEN to mainly deal with automated predictive analysis. Now with this acquisition SAP has renamed the software to SAP infinite Insight.


Infinite Insight can be downloaded from Service Market Place.


There are different versions available in service market place. Latest version will occupy 2.5 GB in space, we don’t have to install the entire setup, to make it easier one can download the object ‘IIWS7000_0-80000274.EXE'. You can give this as search term and download the file. Image below is the screenshot for searching the same so that .exe file comes in the search results.




Once you install SAP Infinite Insight, you can directly start working on it. Once you open the software you will see a screen like this:


Infinite Inisght Home Screen.png


SAP infinite insight is a vast topic and there are lots of features associated with it.


SAP Infinite insight help on SDN will give you a fair idea about the tool and the features that its offering.


If you have any doubt regarding setting up of the SAP Infinite Insight Tool and connecting to a particular DBMS you can go to the SAP Infinite Insight Help Portal.


It gives in depth understanding of each and every topic and all the features like Explorer, Modeler, Social, Recommendation and Tool kit are explained in detail in separate documents.


There is already some interesting blogs written for Explorer and Modeler. You can read that as well.


Since it is very difficult to cover all the features in one single blog, I will try to write another blog exclusively for SAP infinite Insight considering one use case covering E2E functionalities.


Feedbacks are welcome

Hi everyone,


At long last, we now have a customer-facing website (Ideas Place) dedicated to Predictive Analytics & Infinite Insight !! 


Predictive Analytics: Home


Please use it to suggest product enhancements to our Advanced Analytics line.


Our Product Management are looking forward to your suggestions! 


Many thanks to Marc DANIAU  for making this happen.


Kind regards,


Revisiting the Technical Content in BW Administration Cockpit with SAP Predictive Analysis

The following blog post demonstrates how to use the technical content of SAP BW as a forecast data basis for a prognosis model in SAP Predictive Analysis. The aim is to show a smooth and straight-forward process avoiding additional modelling outside of BW as much as possible. In the described use case the Database Volume Statistics[1] have been chosen as an example.


The official SAP Help summarizes the Technical Content in BW Administration Cockpit as follows: “The technical BI Content contains objects for evaluating the runtime data and status data of BW objects and BW activities. This content is the basis for the BW Administration Cockpit, which supports BW administrators in monitoring statuses and optimizing performance.[2]


The Technical Content with its pre-delivered Web Reporting might look a bit old-fashioned nevertheless the variety, quality, and quantity of data which is “generated” at any time in the system is very useful and important for further analysis. The type of data has a strong focus on performance-related data (e.g. query runtimes, loading times) but also other system-related data like volume statistics are available.



BW on Hana and SAP Predictive Analysis[3] together are extending the possibilities how to see the data and what to do (potentially more) with it.[4]

Technically there are simply the following 3 steps to follow[5]:

  1. Expose cube information model to Hana (SAP BW)
  2. Adjust data types to PA-specific format (Hana Studio)
  3. Create forecast model (SAP PA Studio)


The Database Volume statistics in the technical content are designed with a simple data model consisting of just one cube with some characteristics (day, week, month, DB object, object type, DB table etc.) and key figures (DB size in MB, number of records etc.). Following the above steps with this set of data, choosing a certain type of algorithm, results in a bar chart shown below integrated with forecast figures for the past and some months into the future.


The blue bars represent the actual database size by month. The green line represents the calculated figures of the forecast model (in this case a Double Exponential Smooth regression) for the past 20 months and 10 months into the future.



Below are some technical details for each of the mentioned steps:


(1) Expose information model of Infocube 0TCT_C25 to Hana Studio[6]

  • Edit the Infocube in BW and set the flag for “External SAP HANA view”:



Immediately the information model is generated as an Analytic View and can be viewed in Hana Studio:

  • Content -> system-local -> bw -> bw2hana -> 0 -> Analytic Views -> TCT_C25



(2) Adjust data types to PA-specific format (Hana Studio)

  • The generated Analytic View of Infocube 0TCT_C25 looks like below:


SAP Predictive Analysis needs (currently) a specific time-ID column and the key figures must be of data type DOUBLE. The new Calculation View CV_TCT_C25_1 is created based on the generated Analytic View TCT_C25:

  • Column [Month] (PA_TIME_ID_MONTH) = <unique sequential number for each month>[7]
  • Column [Database Size] (PA_TCTDBSIZE) = DOUBLE(0TCTDBSIZE)



(3) Create forecast model (SAP PA Studio)


Creating a forecast model in SPA Predictive Analysis follows the standard tasks as for any other data source.


  • Select data source i.e. select prepared calculation view including (time) key id column and relevant key figures
  • Select and configure components for the model:
    • Use [Filter] component (if necessary restrict columns and rows like filtering the relevant database object types, time range etc.)
    • Choose adequate [Algorithm] component, in the following case a Double-smoothing algorithm (PAL) has been chosen for forecasting several months into the future



And finally the resulting trend diagram is shown (see above).




[1] Infocube 0TCT_C25

[2] SAP Help Portal -> Technology -> SAP NetWeaver Platform

[3] This post deals with SAP BW on Hana 7.40/SP6 and SAP Predictive Analysis 1.19

[4] The blog post is focusing on the technical aspects to get a forecast model successfully executed. The chosen algorithm might not be statistically appropriate.

[5] Assuming the technical content has been activated in SAP BW

[6] Unfortunately it’s not yet possible to expose the information model of a Multiprovider

[7] Data used is from April 2013 to November 2014. To get a unique ID the following calculation is used (in order to get a sequence starting from 1):

    (int("0CALYEAR") - 2013)*12 + int(rightstr("0CALMONTH",2)) - 3

SAP uses Advanced Analytics expertise to support the fight against Ebola


A team within SAP is developing an analytical application to help combat the spread of Ebola. The current outbreak poses a global health and safety threat and requires the help of everyone to be contained.


All hands on deck: The outbreak of Ebola in several West African countries, and the threat of it spreading to Europe and the United States have mobilized hundreds of volunteers around the world to combat its spread. Volunteers have ranged from individual healthcare workers to global companies like SAP, who have joined forces to develop a cutting edge advanced analytics solution to support the helpers in their challenging task of fighting the disease. Our goal is to provide large health organizations with this application to support their mission. This solution promises to be not only valuable in the field in Africa, but can also be used by state authorities to screen passengers of incoming flights from affected countries.


Our plan: We want to make an efficient and fast diagnosis of the disease possible, which is essential for medical personnel to make the right treatment decisions. The developed application will first enable doctors and helpers to gather data on infectious diseases. This information will be subsequently fed into a central database. Based on input from remote doctors and machine learning, the application identifies whether a patient may have been infected with Ebola.


Kevin Richards, Head of U.S. Government Relations at SAP interviewed the WHO and US State Department representatives to identify the key challenges that operators in the field are facing. It became clear that one of the biggest influencing factors is the ability to collect the patient data when in most cases there is no stable connection to the internet. Hence, the quality of collected data will be determined by the robust offline capabilities of the application, which then can be synchronised to an overall data hub as soon as an internet connection becomes available.


Data collection & Diagnosis: Whenever a doctor or a volunteer thinks someone may be showing signs of an infectious disease, they can open the application and navigate to the “Add Patients” Tab. The doctor can take a picture or make a video of the patient and report their symptoms. This data is then sent to a central database along with the doctor’s geo-location and submission time. The application is a cloud solution which can be accessed easily on any mobile devices. The collected data can be stored on the device and synchronised later, as soon as an internet connection becomes available. 


Once the data is uploaded, remote doctors are able to comment on each patient, help with the diagnosis and give treatment recommendations. Meanwhile SAPs Advanced Analytics solution InfiniteInsight clusters the described symptoms, patient data and the judgments of the doctors in the background of the application. This way it can be determined what reported symptoms are most highly correlated with an Ebola diagnosis. For example the symptoms of chills, blurred vision, nausea and vomiting, ulcer, severe headache, and unexplained hemorrhage are the symptoms that are the most important in determining if a particular patient may or may not be infected with Ebola. Upon further analysis into the contributing variables, it becomes clear that ulcers, chills, and blurred vision are the most commonly reported symptoms not associated with an Ebola diagnosis. Conversely, the contributing variables of nausea and vomiting, unexplained hemorrhage, and severe headache are associated with the disease. As the Algorithm determines which symptoms are significant indicators, the application is able to push a preliminary diagnosis to the helpers even in offline mode and an appropriate treatment can commence without any delay. Additionally, the application will allow the tracking of any mutations and subsequent symptom changes of the disease over time and geography.



Forecast: One of the biggest challenges is to understand how the disease will spread during the coming weeks. Hundreds of lives could be saved if we would be able to predict in which cities Ebola is going to break out next – With SAP’s Advanced Analytics we can provide a tool that will give the necessary insights into the future spread and development of the disease based on the data patterns of the collected incidents. Users will also be able to view an infographic in the app to see the current spread of Ebola and information about the appropriate safety measures.



In my previous blog SAP InfiniteInsight - Explorer , I demonstrated how you can create a data set for further analysis.



In this blog I will focus on the SAP InfiniteInsight Modeler to create a model on the data set from my previous blog.


In the previous blog we prepared data that comes from a garden retailer that has a coffee shop. We prepared the data so we could analyze in this blog what will influence someone visiting the garden shop to most likely have a Dessert or Cake at the garden retailers coffee shop.


So lets start...


From the welcome screen I will select "Create a Classification" under the Modeler section. As you can see different types of models can be created.


Figure 1




I have now selected the data from the explorer. I have selected Analytical Record Set 1.


Figure 2



By pressing next you will go to the next screen, will be blank until you press Analyze button. Then figure 3 will be displayed. At this step we can also view the data if we need to.


Figure 3



Now we will select the target variable which we want to analyze, the target variable is who bought cakes or desserts. We also exclude some variables. So here we are saying we want the model to determine what the other variables impact is on our target variable.


Figure 4



The next screen will then show the summary of the model.


Figure 5



The model generation will start, also known as "Training the model"


Figure 6



The results of the model will be shown as seen in figure 7. It is important to know the following values shown and the meaning of the values.

  • KI - a measure of how powerful the model is at predicting. This is a number that ranges between 0 and 1. The closer the KI is to 1, the more accurate the model is.
  • KR - a measure of robustness, or how well the model generalizes on an independent hold out sample. KR should be a number ideally above 0.95.


So based on the above, our KI measure is poor. But will serve our purposes for the blog 


Figure 7

We can now review the model results by selecting the appropriate options.


Figure 8

By selecting "Contribution by variable" we can see that the following aspects influence the scenario. Firstly pets, then children, then the segment, then the age, etc.


Figure 9

We can now take it further and analyze the age variables. Here we see that ages between 18-26 and 48-70 are likely to buy a cake or dessert. Individuals with the age 26-48 are less likely.


Figure 10.

So this now tells us the coffee shop will have better success with cakes and desserts that are appealing to people with pets, have children and are between the age gaps identified. This will help deliver a more precise advertising if needed.


Hope the above shows how a predictive model is created by just clicking away and how the results can be a valuable tool.

Developers are rarely shy about sharing their views on new tools and technology. I appreciate their passion and healthy scepticism, in fact I seem to have developed my own slightly cynical perspective. So when I heard we’d added SAP InfiniteInsight (formerly from KXEN) to the SAP OEM offering (it’s my job to build OEM marketing content), I quietly wondered how relevant the solution was going to be for our OEM partners.


As I gathered solution information my sceptical attitude soon began to shift to one of pleasant surprise at how ‘cool’ the functionality was. I knew that predictive analytics was about looking at data and forecasting the likelihood of future events, and yes that is cool, but that’s not what impressed me. My own experience of working with in-house data scientists (dudes with PHDs in statistics and analytics) had shown me that creating a predictive model for optimizing campaign lead follow-up takes weeks, if not months. The process required the identification of predictive variables and development of a consistent model for using those variables to score prospects based on how likely they were to buy, and then involved lots of iterative testing.



What I hadn’t expected was SAP InfiniteInsight’s ability to self-learn from historical data… and identify the predictive variables without a data scientist in the room. In fact, the software can continuously relearn and adapt its scoring based on current target audience actions.


Next I’m thinking, ok this would be great value-add for any partner building customer management solutions or operations software but it must be pretty tricky to integrate… and I was once again pleasantly surprised. SAP InfiniteInsight’s core functionality resides in 4 DLLs totalling just 1.5 MB with comprehensive APIs.

That means our OEM partners can relatively easily embed the technology, point the solution at an historical database and let it figure out the predictive characteristics and then use those variables to score a net new target individual or target dataset of many individuals. This can even be done in real-time so if someone is surfing my ecommerce site and has selected to purchase an item I can instantly offer up the next-best three items as suggestions – based on what others have typically bought with the 1st item.


This really gets the brain cells firing in terms of the all the potential scenarios where SAP InfiniteInsight might extend existing application value, and drive increased customer satisfaction and loyalty. Here are just a few of the scenarios that I thought were appealing.

For CRM related applications:

  • Optimize direct marketing campaigns to boost response rates
  • Analyze customers’ website touch points to improve their online experience
  • Target customers that have a high propensity to churn with new customized offers
  • Analyze customer purchasing histories to deliver targeted up-sell recommendations


For business operations:

  • Predict how market-price volatility will impact production
  • Foresee changes in demand and supply
  • Analyze streams machine data to build proactive maintenance schedules
  • Forecast customer demand and optimize inventory
    in real time


For finance solutions:

  • Analyze sales transactions to identify unsafe investments
  • Predict patterns of fraud within Big Data
  • Perform credit score analysis in real time


And I almost forgot, if you’re interested in turbocharging your predictive analytics performance you can also pair InfininieInsight with the in-memory computing power of SAP HANA for a real-time experience.

In the end, my mind set had completely reversed from one of skepticism to one of optimism but for those of you that have that skeptical bone in your body, I invite you to do your own investigation. I’ve included a couple of links to speed the process.


SAP InfiniteInsight home page

SAP InfiniteInsight Industry and LOB scenarios

SAP Predictive Analytics OEM eBook

SAP InfiniteInsight Introduction and Overview Blog


If you’re interested in learning more about…

  • building predictive models in minutes or hours, not weeks or months
  • integrating automated predictive modeling into your applications
  • increasing your application footprint at existing customers

then please reach out to our OEM team. Many SAP OEM partners are already using SAP InfiniteInsight to differentiate their offerings and open new revenue streams.


Get the latest updates on SAP OEM by following us @SAPOEM on Twitter

For more details on SAP OEM partnership and to know about SAP OEM platforms and solutions , visit us www.sap.com/partners/oem



I have not seen much posted regarding InfiniteInsight. I thought I would take some time to demonstrate parts of this product.


InfiniteInsight is predictive analysis tool SAP has acquired form the acquisition of the company KXEN.


This tool is designed to make the process of using a predictive tool easier and with less reliance on a data science. Also everything done is done by just CLICKING AWAY.


When you launch the product you will see Figure 1 as your entry point. In this blog I will focus purely on the explorer part. Explorer is used to get your datasets in a format that we can used to build predictive models on.

1. Explorer.png

Figure 1 - InfiniteInsight




So first step is to create explorer objects, will need to select the source of the data. In this scenario we are pulling from HANA.

2. Connect To Data.png

Figure 2 - Create or Explorer Objects


You can then create your datasets. In my example I have already created the datasets, all done by clicking and no code. I have created three types of data sets.

  1. Entity
  2. Time Stamped
  3. Analytical Record


Figure 3 - data sets


I wont be showing how I created each data set as there is a few screens that would need to be captured and will make the blog too long. Here is a example of the entity data. Data that shows entity that will be analyzed.


Figure 4 - entity data

Example of time stamp data, here we just create time entries.


Figure 5 - Time Stamp Data


The analytical record we have basically taken the time stamp data and joined the entity data, when creating this we can choose what fields to keep or exclude.


Figure 6 - Analytical Record


You can create different versions of the types of data, here I have a second analytical record set. It is the same as the first one except we have added some calculation columns being a sum, count and count distinct. Once again created just with clicks and no code.


Figure 7 - Analytical Record 2

I have also created a third analytical record where we have added extra columns that are pivoted so we can use to analyse even further.

As seen above, the explorer part allows you to get different sets of data and combine them, do counts, pivots and more. Once the data is arranged in desired format you can now move to the next section to predict data on it.


I will try cover that on another blog.

First some background about the issue:
      InfiniteInsight (II) is not letting you use your analytical views, calculated views and so on in the user interface

In the background, II will use the capabilities of the ODBC driver to get the list of "data space" to be presented to the user using a standard ODBC function.

Unfortunately, the HANA ODBC driver is not currently including the names of the analytical views, calculated views.


However this ODBC driver behavior can easily be bypassed in two ways:
- simply type in the full name of the calculated view (including the catalog name) like "PUBLIC"."foodmart.foodmart::EXPENSES"
- configure II to use your own custom SQL that will list the item you want to display.

This feature is used in II to restrict the list of tables for example when your datawarehouse has hundreds of schemas.


One file needs to be change depending on if you are using a workstation version (KJWizard.cfg) or a client/server version (KxCORBA.cfg) by adding the following content:


ODBCStoreSQLMapper.MyDSN.SQLOnCatalog1="  SELECT * FROM (   "


ODBCStoreSQLMapper.MyDSN.SQLOnCatalog3="  UNION ALL   "

ODBCStoreSQLMapper.MyDSN.SQLOnCatalog4="   SELECT '""' || SCHEMA_NAME || '""', '""' || VIEW_NAME || '""', VIEW_TYPE FROM SYS.VIEWS WHERE NOT EXISTS (  "


ODBCStoreSQLMapper.MyDSN.SQLOnCatalog6="         WHERE SCHEMA_NAME = a.CATALOG_NAME AND VIEW_NAME = a.CUBE_NAME AND ( MANDATORY = 1 OR MODEL_ELEMENT_TYPE IN ('Measure', 'Hierarchy', 'Script') )  "


ODBCStoreSQLMapper.MyDSN.SQLOnCatalog8="  ) order by 1,2   "


The KxCORBA.cfg file (used in a client/server installation) itself is located on the InfiniteInsight server installation directory named:

     C:\Program Files\SAP InfiniteInsight\InfiniteInsightVx.y.y\EXE\Servers\CORBA

where x.y.z is the version you have installed.


If you are using a standlaone (a.k.a. Workstation), then the file to modify is KJWizard.cfg which is located in:

     C:\Program Files\SAP InfiniteInsight\InfiniteInsightVx.y.y\EXE\Clients\KJWizardJNI

where x.y.z is the version you have installed.


In this example I only include tables, views, calc and join views with no mandatory variables or 'Measure', 'Hierarchy', 'Script' variables at all.


You may need to adjust this configuration SQL if you want to list Smart Data Access objects.


You can notice here that we are changing the behavior for one ODBC DSN (MyDSN), so this value might need to be adjusted in your environment.

You can also replace it with a star (*), then this configuration will be applied to all ODBC DSN, which may not work on other databases.


Some functionalities in II may not work yet properly despite this workaround.

For example:

  • data manipulations requires the configuration file change
  • view placeholhers and in general views attributes are not properly supported
  • some type of aggregates are not "selectable by name" which mean that if used in a select statement in HANA Studio it will not be returned (select * vs select cols).


Hope this will save you some time


Filter Blog

By author:
By date:
By tag: