Intelligence Follows Data

ttrapp · ‎11-20-2012

Although ABAP develops in an evolutionary way, SAP’s solution for InMemory technology will affect your daily work as ABAP developer.

At first you will solve existing pain points in your custom code to make important applications significantly faster.
Then you will take the quick wins so that your enterprise gets immediate benefit from your HANA investments.
After those first steps you will create applications that use the full power of SAP HANA and introduce completely new features to your applications.

In this blog entry I will continue my blog series about analytical applications in ABAP for HANA and will share my ideas according to the second and third bullet point in the list above and discuss which skills you will need in the future.

Quick Wins in ABAP for HANA

Some of you will still remember the times when SAP started to introduce Enjoy SAP Controls like ALV grid which allowed a better user experience on the one hand but had some remarkable features for business users like Excel integration. Lots of SAP consultants made lots of money only with the reuse functions modules that display a selected set of database rows in an ALV grid. This method is now common folklore but shows that making data accessible can make a business application much more valuable for business user. ABAP for HANA offers new possibilities that every ABAP developer has to know:

The ALV for HANA let users navigate through huge data sets in real time.
New search helps will make data accessible in a completely new way by doing linguistic search, multi column search and fuzzy search. Your apps will have the same look and feel like Google Suggest that uses completion of the input data and will help the user to find the information he or she’s looking for within short time.

So I recommend that you learn about these new features to keep your knowledge up to date.

Mashups create Synergy-Effects of Different SAP Technologies

In my last blog I gave an example how dashboards can support the user to make better decisions. In fact you don’t have to use libraries like D3 and you can also use SAP tools that make the creation of dashboards very easy. SAP offers various tools that allow fast visualization, f.e.:

In fact you can use any tool that allows you to perform queries on HANA data and expose them as web application. You can integrate these web applications in the following ways:

using HTML controls in ABAP Dynpro or IFrames in Web Dynpro for ABAP
as Island in Web Dynpro for ABAP application
as CHIPs arranged using the Page Builder that enhance existing SAP Business Suite applications (in ABAP Dynpro or Web Dynpro) that are displayed on the side panel of the SAP NetWeaver Business Client (NWBC).

In the latter case you can “wire” the CHIPs so that information between the SAP Business Suite resp. your custom developed application is interchanged. For you as ABAP developer working in SAP HANA environment mashups will be more and more important and you should learn how to develop them.

At SAP TechEd there was a lesson TEC160 “SAP Technology Highlights – Putting them all together” that shows the interplay between SAP Business Suite, SAP NetWeaver Business Client, HANA and Business Object-Tools. I definitely recommend this session.

Codepushdown by Using Standard HANA Libraries

In the past as well as today ABAP developers use a treasure chest of reuse tools for rapid development of business solutions. These are huge frameworks as well as libraries like FIMA package that contains functions modules and classes for financial mathematics. In the future more and libraries inside HANA will come like the Business Functions Library.

There will be various applications of above mentioned libraries and techniques. For example you can use fuzzy and multi columns search to look for duplicates of business partners and other business objects.

Decision Support as Unique Selling Point of Next Generation Apps

Mashups are the most important feature of next generation apps because visualization of operational data. Using them the user will make faster and better decisions when doing his or her work.

Next Generation analytical applications for ABAP introduce the aspect of quantitative approaches in decision making to enterprise resource planning. Let me give so examples to sketch this in detail:

In an insurance company an official in charge wants to know how much an insurance claim will probably cost. Therefore we don’t use sophisticated model and compare the insurance claim with the ones from the past to make a prediction and can therefore make better decisions when working on the insurance claim. In the SAP solution for SAP we usually work with business rules (the ABAP frameworks are BRF resp. BRFplus) to make the processing of business rules intelligent and the result of forecasts can be used in business rules. Please be aware that business rules for itself are weak doing predictions and there are hard to change because this needs expert knowledge of the SAP solution and especially customizing changes. So forecasts can make the management of insurance claims more agile.
Predictions are very important in financial solutions as well as portfolio management. Here we have complex statistical methods and more and more will be supported by SAP HANA directly using native planning and calculations engines that take benefit from the multicore architecture of SAP HANA and will give you immediate results even when working on huge data sets.
In my last blog I gave the example of calculating account balances of a certain customer. Supposed the customer is in the red the task is to make a prediction how this will develop in the future.

SAP is very strong in the area of business processes and business intelligence so they have tools that allow real-time monitoring, real-time situation detection and real-time dashboards for different user roles. So the inside to action paradigm, that is very well explained as example SAP CRM Competitor Analysis, will become ubiquitous in your applications.

In my opinion most SAP users are well prepared for this step because we all know about the basicsof descriptive statistics: we can read different chart types and most have at least an intuitive idea of statistical parameters like arithmetic mean, median, standard deviation, variance, range and absolute deviation. When it comes to more predictive models the situation becomes a little but difficult because from experience those concepts are not told at school and there are only used by some experts in distinguished industries like production, production planning, financial services and insurance to mention a few.

HANA is offering various methods for statistics and data mining because of integrated statistical functions: We can set up and R server on the HANA system and can call R from SQLScript and HANA has an integrated library for predictive analysis. What is more useful? In fact this is up to your requirements: If you are familiar with R then you can choose many libraries and create even your own. Since R is a separate server on an HANA system the call can become a bottleneck where PAL can benefit directly from HANAs multicore architecture.

The most interesting question is how this can help us to create Next Generation ABAP applications. Before I come back to that question I would like to discuss how quantitative (= mathematical) approaches are used in commercial sphere.

Is Corporate Culture ready for Quantitative Approaches in Operational Excellence?

With automated decision support we introduce the aspect of descriptive and inferential statistics to SAP Business Suite. Statistics and Econometrics already have a plethora of different models, methods and algorithms and many of these techniques are now supported directly within SAP HANA.

Unfortunately tool support is a necessary but not sufficient precondition because from experience inferential statistics is limited to certain industries and lines of businesses. I think this has many reasons:

Building sound models requires statistical knowledge that is usually not part of school education. Usually this is done by statisticians who in an iterative approach starting with visualization, then making assumptions, choosing model and model parameters and doing computation and analysis.
Many managers act on instinct when making decisions: the look at the current fact and figures and have macroeconomical and political trends in mind.
Quantitative approaches in management science (think of inferential statistics or operations research) are still a niche. Many mangers have only limited mathematical skills.
Many scientists working in the commercial sphere have statistical skills but don’t use them. It is common folklore that most mathematicians will never apply their skills after they leave university.

Please let me summary: quantitative methods approaches in management science are well developed in certain lines of business like finance and insurance and in others not. IMHO there are good and bad reasons for the dislike of mathematics:

Many people dislike mathematics and in fact I consider this as one of the strongest cultural taboos nowadays. There are many people who say that they “hate” mathematics and are proud that they never understood it at school. This is really sad because if you don’t understand the basics of mathematics you will never understand important aspects of science as well as economical and public life.
But there are many good reasons to question the value of statistics. Of course it is the basic science of all empirical sciences and you should know about concepts like median or variance when discussing observations but does statistics really help you?

I will try to answer the last question and therefore I will try to describe what statistics is about.

How do Statisticians work?

Lets discuss an example: a series of values (think of account balances) can be seen as series of data points measured at successive time instants (perhaps the first of every month). The result is a curve of the payment history of a customer. So how can we do a prediction of future data. SAP Mentor Alvaro Tejada explained a possible solution months ago in one entry of his visionary blog series about HANA and R: http://scn.sap.com/people/alvaro.tejadagalindo/blog/2012/01/13/prediction-model-with-hana-and-r As mathematician I like this approach and I hope that many people will start to get experience with this approach. Nevertheless I have to do explain the limitations of this approach by listing its implicit assumption:

At first not every stochastic time series allows predictions – think of rolling an (ideal) dice for example.
The R-predict function performs a so called simple linear regression. In fact it is possible that the linear approach is not feasible for a concrete time series.
Most models in predictive analysis are much more complex model that an additive as well as multiplicative aspect called ARIMA: There are arima implementations available in R (f.e. as extension of above mention predict command) that is often used in econometrics but even this approach has limitations: It only works for so called non-stationary processes that have a clear trend that is usually linear, quadratic or exponential.
Most used models like ARIMA have a background in econometrics and include besides trends seasonal aspects. This leads to the question whether these aspects are really the right ones to describe your observations and if they make your analysis more complex.

So let me summarize: Using methods from inferential statistics doesn’t simply produce correct results because without further analysis we can’t be sure that our model is sound. In fact the statistical model is the holy grail of every statistician and he’s always looking for the truth of model and does a lot of work the discuss strengths and weaknesses of the model. When I was at university statistics was a very conservative science and was proud to the right tool to discover laws of nature as well as economics – and this some something different than decision support in enterprises.

The Rise of Data Mining

Unfortunately often there are no simple laws as “nature” of huge data sets in enterprise data. Moreover, there are serious problems: We have misrecorded entries in the database, moreover the data is dispersed in different database systems and may be time dependent. This can be a nightmare for statisticians who know all about the possible pitfalls: structures that appear in an ad hoc-analysis may occur by pure chance and don’t indicate an underlying law.

So many practitioners choose a different approach for analyzing huge datasets: instead of looking for a perfect model they use algorithms instead of models and they got very creative and choose methods computational geometry as well as machine learning and even semantic technologies to tackle large datasets. And so it’s not surprising that SAP’s Predictive Analysis Library for HANA offers lots of techniques of data mining. So what’s inside SAP’s Predictive Analysis Library for HANA? Besides above mentioned algorithm for linear regression and common folklore like ABC analysis there are implementations of Data Mining-algorithms:

One algorithm is the so called k-means-clustering that is often used in data mining and partitions observations to cells. As a consequence we can get an overview about the structure of data. BTW: there are R implementations for that task, too: http://www.r-bloggers.com/k-means-clustering-on-big-data/.
Other algorithm comes from machine learning: PAL can create decision trees from input data that make classifications of future data possible. Another, simpler algorithm is KNN that allows mapping an observation to other data by calculating distances. PAL also offers the apriori algorithm that calculates associations that can be seen as inference rules.

Data Mining offers techniques to analyze huge datasets and uses statistical methods in a very pragmatic way and sometimes leaves “hard statistical science”.

Even if our mathematical models may not accurate to accurate to deal with large data sets (seen from the perspective of pure science), we can’t afford to ignore them because they are strategic assets for business decisions.

HANA will enable us to better understand data. SAP Mentor Thorsten Franz explained this in his blog about the true value of BW for HANA: HANA will bring us agility and the possibility of deeper insight into our business. If we want to go a step further we will use methods of Data Mining and sometimes inferential statistics to analyze transactional data. For you as ABAP developer this will have the following consequences:

To perform code pushdown you will have to learn SQL and SQLScript and should know about the most important libraries like PAL and BFL.
Visualization is the most effective way to deal with data and the true for SAP users as well for people working in Data Mining and statistics. Dashboards give immediate business value and so I you should learn how to create mashups.
In the future there will be more and more synergies between different technologies and solutions of SAP’s product portfolio because so basic knowledge of these technologies especially BO tools will be helpful for ABAP developers creating mashups.

ABAP developers won’t become statisticians but in the future data miners and statisticians will perform analysis of operational data. Since mostly ABAP developers are experts for the data model of SAP Business Suite you will work closely together with statisticians and data mining experts.