1 2 3 50 Previous Next

SAP HANA and In-Memory Computing

737 Posts

Dynamic Tiering Landscape Setup:

 

Dynamic Tiering feature [SAP HANA Extended Storage feature] is supported in since HANA SP09.

 

This blog is about My Experience on Dynamic Tiering Setup. This blog will helpful for you to make the Dynamic Tiering setup in your landscape.

 

I have two Linux Hosts [I named as LinuxHost1, LinuxHost2] with same configuration and same root user and password.

 

HANA Server will be installed on LinuxHost1 and ES Server will be installed on LinuxHost2.                  

Both the components cannot be installed on same Linux machine.

 

 

Shared File systems:

The following file systems in the two Linux hosts should have shared persistence.

[I am going to Install HANA System with SID: HDT]

 

  1. /hana/shared
  2. /hana/data/HDT
  3. /hana/log/HDT
  4. /hana/data_es/HDT
  5. /hana/log_es/HDT

 

Installation Parameters:

 

I install HANS System using batch file. [Batch file is stored in LinuxHost1] [File Format *.cfg]

The following Parameters, especially we have to mention for Dynamic Tiering Setup.


1.addhosts:

We have to mention the details of LinuxHost2 and its role.

 

2.Component_dirs:

In Component_dirs parameter,

we have to mention location of HANA DB server Installation File package and HANA Option (Dynamic Tiering) Installation File package.

The format is Component_dirs = <HANA_Installation_Path>, <HANA_Option_Installation_path> comma separates both the installation packages.

 

 

I tell how to find out the location of installation package:

We have to choose the folder path where we have manifest file:

 

<Component_dirs>: Location of <HANA_Installation_Package>:


<Component_dirs>: Location of <HANA_Option_DynamicTiering_Path>

 

So these two folder paths have been mentioned in <component_dirs> parameter

 

3. Other Required Parameters:

Then we have to mention all other required parameters in the batch file (cfg file) like SID, Instance Number, DB_Mode (Single Container), sapadm password, system administrator (<SID>adm) password, system_user_password, user_id, group_id, sapmnt path, data path and log path.

 

 

 

Installation Process:

Once Configuration file is ready, we have to start Installation using hdblcm with configuration file.

 

During Installation Process, To be Installed Components will be listed.

Components are

  1. Server (SAP HANA DB)
  2. ES (Dynamic Tiering).

Select the option all, which will install HANA Server in LinuxHost1 and Dynamic Tiering setup [ES Server] in LinuxHost2.

 

Installation Process will be started.

 

LinuxHost2 will become additional host and will act as Extended Storage Server.

 

Once Installation Process Completed, HANA Server is installed in LinuxHost1 and Dynamic Tiering setup [ES Server] is installed in LinuxHost2.

 

List of Process:


Once Installation Completed Successfully, I am going to check the running process in two Hosts.

 

Linux Host1: (HANA DB)

 

Name server, preprocessor, compile server, index server, statistics server, xs engine are running in Linux server1.

 

Linux server2: (ES)

 

Name server and ES server are running in Linux server 2.

 

Studio:


I am Connecting HDT System from the Studio. It will display list of servers from Both the Hosts.

 

 

 

How to Make Dynamic Tiering is running:


After Installation completed, If we check the system, We can see the Information that Dynamic Tiering is Installed But not running.

 

Execute the query,

SELECT * from M_SERVICES



Co-ordinater Type for esserver is NONE.

So Dynamic Tiering is Installed But not running. How can we make Dynamic Tiering is running?

 

Create Extended Storage in LinuxHost2:

 

To make Dynamic Tiering is running, we have to create Extended Storage in LinuxHost2.

 

Execute the following SQL Query in HDT System.

CREATE EXTENDED STORAGE AT '<full_host_name_of_Linuxhost2>' size 10 GB ENABLE DELTA


Now check M_Services:

 

 

Now Co-coordinator Type for esserver is MASTER and Dynamic Tiering is running

 

 

Extended Storage Table Creation:


We have to use <EXTENED STORAGE> Clause in SQL Query to create Extended Storage Table.

 

SQL Query:

CREATE TABLE "SHANMUGAVEL".t2(A INT, B INT NULL, C INT NOT NULL) USING EXTENDED STORAGE;

 

Extended Storage Table T2 is created.

 

Insert:


Now I am inserting the data to Extended Storage Table and It will be stored in Extended storage


Insert into "SHANMUGAVEL"."T2" values (1, 1, 1)



Select:


I am checking Data Preview from Extended Storage Table.


 

The Data is retrieved from extended storage 



Regards,

Muthu

Hi everyone, in this blog post I'm gonna show you how to query data across databases. As we know, the concept of multitenant database containers (MDC) was introduced in SAP HANA SPS09. With MDC, now we can run multiple tenant databases on one SAP HANA system and that enables strong separation of data and users as well. But do you know the cross-database queries are also supported in SAP HANA SPS09? For example, it's possible for one user in its tenant database to query data from another tenant database. But how can we achieve this? Is it enabled by default? Maybe you have a lot of questions. Don't worry. Let's first have a look at some basics of the cross-database queries and I'll show you an example step by step.

 

Related materials

First of all, I'd like to share with you some useful materials about this topic. You can find everything I showed in this post from the following materials. If you're interested, you can read them and make the same example with mine.

 

 

Key concepts

Here I just borrowed page 41 from SAP HANA SPS 09 - What's New? - Security, since I think this page highlighted the key concepts well.

 

1.PNG

 

The key concepts can be summarized as:

  • Cross-database queries are not enabled by default. Thus we need to configure it.
  • The mechanism of cross-database queries is based on "remote identity" which means User_1 plays the role of User_2 when he/she queries data from Table_A in the above example.
  • A user in the target database can only be associated with one user in the source database, which means User_2 cannot be associated with User_3.
  • The association is unidirectional.
  • Only the SELECT privileges are considered, which means User_1 is not allowed to insert data into Table_A.

 

Scenario

Based on the key concepts, we can create a scenario to test the cross-database queries in SAP HANA SPS09 MDC environment.

 

2.PNG

 

As the above image displayed,

  • there are three tenant databases in our scenario, i.e., DB1, DB2 and DB3.
  • Meanwhile, there are three users in different tenant databases, USER_A in DB1, USER_B in DB2 and USER_C in DB3.
  • USER_A has a table named PRODUCT which holds products in China.
  • USER_B has a table named PRODUCT as well which keeps products in the US.
  • Now USER_C wants to create a PRODUCT view which unions all products in China and in the US as well.

 

Now comes the problem. How can we achieve this?

 

Step by step example

Notice: At the beginning of each step, I'll first show <DATABASE>(<USERNAME>) in this step.

 

1. SYSTEMDB(SYSTEM) Create three tenant databases

CREATE DATABASE DB1 SYSTEM USER PASSWORD Manager1;
CREATE DATABASE DB2 SYSTEM USER PASSWORD Manager2;
CREATE DATABASE DB3 SYSTEM USER PASSWORD Manager3;







 

2. SYSTEMDB(SYSTEM) Make the configuration in red box at the system layer, Administration -> Configuration -> global.ini -> cross_database_access

 

3_.png

 

We first enable the cross-database queries and then set DB1 and DB2 as targets for DB3.

 

3. DB3(SYSTEM) Create USER_C in the tenant database DB3

CREATE USER USER_C PASSWORD aA111111;







 

4. DB1(SYSTEM) Create USER_A in the tenant database DB1 and associate USER_A with USER_C in the tenant database DB3

CREATE USER USER_A PASSWORD aA111111;
ALTER USER USER_A ADD REMOTE IDENTITY USER_C AT DATABASE DB3;







 

You can also use CREATE USER ... WITH REMOTE IDENTITY ... statement directly like in step 5.

 

5. DB2(SYSTEM) Create USER_B in the tenant database DB2 and associate USER_B with USER_C in the tenant database DB3

CREATE USER USER_B PASSWORD aA111111 WITH REMOTE IDENTITY USER_C AT DATABASE DB3;







 

You can also use CREATE USER plus ALTER USER statements like in step 4.

 

6. DB1(USER_A) Create the PRODUCT table and insert some products.

CREATE COLUMN TABLE USER_A.PRODUCT (
  ID INTEGER,
  NAME NVARCHAR(100),
  LOC NVARCHAR(100),
  DESC NVARCHAR(100)
);
INSERT INTO USER_A.PRODUCT VALUES (1001, 'Product A', 'CN', 'Made in China');
INSERT INTO USER_A.PRODUCT VALUES (1002, 'Product B', 'CN', 'Made in China');







 

7. DB3(USER_C) Can't wait anymore? OK. Let's first let USER_C query products made in China

SELECT * FROM DB1.USER_A.PRODUCT;







 

Here you go.

 

4_.png

 

8. DB2(USER_B) Create the PRODUCT table and insert some products

CREATE COLUMN TABLE USER_B.PRODUCT (
  ID INTEGER,
  NAME NVARCHAR(100),
  LOC NVARCHAR(100),
  DESC NVARCHAR(100)
);
INSERT INTO USER_B.PRODUCT VALUES (2001, 'Product C', 'US', 'Made in US');
INSERT INTO USER_B.PRODUCT VALUES (2002, 'Product D', 'US', 'Made in US');







 

9. DB3(USER_C) Now let USER_C query products made in US.

SELECT * FROM DB2.USER_B.PRODUCT;






 

5_.png

 

10. DB3(USER_C) Create the PRODUCT view and query the data from this view.

CREATE VIEW USER_C.PRODUCT AS
SELECT * FROM DB1.USER_A.PRODUCT
UNION ALL
SELECT * FROM DB2.USER_B.PRODUCT;
SELECT * FROM USER_C.PRODUCT;






 

6_.png

 

Till now we've successfully achieved the scenario we created.

 

Want more?

- DB3(USER_C) Failed to insert data into table DB1.USER_A.PRODUCT since only SELECT is allowed.

INSERT INTO DB1.USER_A.PRODUCT VALUES (1003, 'Product E', 'CN', 'Made in China');





 

7.PNG

 

- DB1(USER_A) Failed to let USER_A query data from the view DB3.USER_C.PRODUCT since the unidirectional association.

SELECT * FROM DB3.USER_C.PRODUCT;




 

8.PNG

 

- DB2(SYSTEM) Failed to associate USER_B with USER_A at the tenant database DB1 since only one user is allowed.

ALTER USER USER_B ADD REMOTE IDENTITY USER_A AT DATABASE DB1;



 

9.PNG

 

- DB1(SYSTEM) Have a look at remote users.

SELECT * FROM REMOTE_USERS;



 

10_.png

 

You can do the same with DB2(SYSYEM).

 

 

That's it! Hope you enjoyed reading my blog and making cross-database queries successfully.

With SAP HANA SPS 09, the SAP HANA database lifecycle manager (HDBLCM) has evolved to support system integration for three of the new SAP HANA options. SAP HANA options are separately licensed products that can be used in combination with the SAP HANA platform.

 

Specifically, the SAP HANA system can be installed or adapted to support:

  • SAP HANA accelerator for SAP ASE
  • SAP HANA dynamic tiering
  • SAP HANA smart data streaming

 

If you're interested in using SAP HANA platform LCM tools to create an SAP HANA system which includes one of the SAP HANA options, take a look at the following short videos:

 

SAP HANA Options: Part 1 System Installation – Demos a multiple-host system installation of the SAP HANA server and SAP HANA dynamic tiering

 

 

 

 

SAP HANA Options: Part 2 Host Addition – Demos an SAP HANA smart data streaming host addition to an existing SAP HANA single-host system

 

 

 

 

SAP HANA Options: Part 3 Failover Host Groups Demos a multiple-host system installation of the SAP HANA server and SAP HANA accelerator for SAP ASE in failover host groups, with a test of the failover groups in the SAP HANA studio

 

For more information about the SAP HANA platform lifecycle management tasks featured in these videos, check the:

SAP HANA Installation and Update Guide

SAP HANA Administration Guide - Section 2.9.1 SAP HANA Platform Lifecycle Management

 

For more information about the SAP HANA options featured in these videos, check the:

SAP HANA Accelerator for SAP ASE: Installation Guide

SAP HANA Dynamic Tiering: Installation Guide

SAP HANA Smart Data Streaming: Installation Guide

In SP09 MultiDB system, Webdispatcher processes inbound HTTP and HTTPS connections to XS services.

In SP09 System, Multiple Tenant databases will be there. Each Tenant database will have unique xs engine and Single Webdispatcher is available for one HANA Instance So Webdispatcher is common for all the Tenant databases.

Webdispatcher receives the user http request and It will point to corresponding XS Engine port using [wdisp/system_1] Parameter.

 

 

In SRCVHost, The Http Request address will be mentioned. Using Webdispatcher, this request will navigate to XS Engine Port 3xx07

(xx - HANA Instance Number)

 

Problem Statement:

 

If The Service WebDispatcher is down and looks red (I don't know What is the cause of the Issue) and system is also looks red.

 

 

I have tried the following options, I could not bring out the system to Normal Condition.

 

1. Setting Webdispatcher Instances to 0 and change the Instances to 1.

2. HANA System Restart

3 Start and Stop Tenant DB Index servers.

 

Nothing is working and Finally I found the work around for this problem.

 

Solution:


1. In daemon.ini, Set Instance value for Webdispatcher is 0

 

 

2. Then We have to remove the webdispatcher from Name Server Topology

ALTER SYSTEM ALTER CONFIGURATION ('topology.ini', 'system') UNSET ('/host/<hostname>', 'webdispatcher') WITH RECONFIGURE



3. Remove the Parameter [wdisp/system_1] from Profile Section under Webdispatcher.ini



4. Now, In daemon.ini, Set Instance value for Webdispatcher is 1

 

 

5. After Setting the parameter value is 1, Webdispatcher service will be started now. Now the service will be green.

 

 

 

6. Then set the Parameter [wdisp/system_1] again from Profile Section under Webdispatcher.ini

 

 

Now The Webdispatcher Issue in SP09 Problem is solved



I remember my initial encounters with HANA extremely vividly. One of them was a conversation with Paul Barker, who was an early HANA addict. We discussed building SQL models in HANA, and then the benefits of HANA Information Views. He was wide-eyed about the benefits of using HANA CE Functions, purporting insane performance increases.

 

What are CE Functions?

 

HANA Calculation Engine Functions (or CE Functions) are a mechanism introduced in HANA SPS02, which allow direct access to the HANA Column Store. They allow extremely precise control over how precise columns are projected, calculated and aggregated. In fact, they are a direct representation of how the original HANA Column Store modeled vector functions.

 

Why were CE Functions so fast?

 

They completely bypassed the SQL optimizer and meant that whatever you wanted to executed was run in the column store, and you always got the full performance of HANA. In the early days of HANA, SQL could be extremely inefficient and cause HANA to materialize large volumes of data in the SQL Engine, in row format. This caused big memory bubbles and 100-200x slower performance.

 

By comparison, CE Functions always ran in the column engine, and provided you were returning modest result sets, were highly efficient. They actually effectively compile into a language called L, which was HANA's interim language, and is very similar to C++.

 

In 2011 and 2012, I very frequently coded high-performance aspects of a HANA solution in CE Functions.

 

What were the downsides to CE Functions?

 

CE Functions are part of the HANA SQLScript language, and as such they are an extension of ANSI SQL. That's neat, but it also means that there aren't many spare operators, and the syntax could get really funky. Sometimes you need a single quote ('), sometimes double quotes ("). Sometimes double single quotes (''), or triple single quotes ('''). CE Functions are extremely picky!

 

What's more, you have to specify every column, and with complex procedures, this means a lot of copy and paste, and activation of code. The editor wouldn't give you much help, and the error messages were terse. In short, you have to be an expert SQL programmer and have a very good working knowledge of the HANA column store, to write good SQLScript with CE Functions.

 

But if you wanted performance, wow - this was HANA speed.

 

Along came the BW on HANA...

 

And so in 2012, along came "Project Orange", or BW on HANA, with HANA SPS04. The HANA database had matured enough to work for SAP BW, but BW never used CE Functions. Instead, the BW team had their own proprietary way into HANA.

 

This soon turned out to be a limitation within BW, and the Calculation Scenario concept was born. Now, when you compile a BW object in BW 7.4, a HANA view is created, which can be accessed either via BW's OLAP engine, or directly into the HANA database. The two are interchangeable, but CE Functions were way too restrictive for BW's needs.

 

... shortly followed by Business Suite

 

Then in 2013 came Business Suite on HANA. The Business Suite makes extensive use of ANSI SQL via the OpenSQL interface, and it debuted with HANA SPS06. Due to the way the Business Suite is programmed, the HANA team had to make SQL much faster than it was before, and a ton of development went into the SQL Optimizer. As of HANA SPS06, there is often little different between SQL and HANA Information Views.

 

Let's not forget HANA Live

 

HANA Live is programmed only in HANA Graphical Calculation Views. These had a habit of performing badly, if not designed very well, so the HANA development team put a ton of effort into making them perform great. They now compile into calculation scenarios, which are a XML representation of columnar projections and calculations.

 

What happened to CE Functions along the way?

 

Over half of SAP's HANA customers use BW, and Business Suite is the fastest growing product by go-lives. Nether of these make any use of CE Functions and so they got extremely limited development attention and few new features.

 

Coupled with the fact that they are inflexible and hard to use, the other mechanisms - SQL, and Graphical Calculation Views, got all the attention. CE Functions didn't get any further optimizations, and stayed the same since HANA SPS05. They didn't get worse - the other options just got better and better.

 

As of HANA SPS07, we found that Graphical Calculation Views were always faster than CE Functions. In HANA SPS08, we find that SQL, Analytic Views and Graphical Calculation Views all have exactly the same explain plans in many cases. They all perform the same and it's just a question of preference how you design things.

 

In fact, extremely complex views can be created as cascaded Calculation Views, and the view compiler will collapse and simplify the views and create a single optimized set of column projections, calculations, vector joins and unions as a single Calculation Scenario. This is truly the jewel in HANA's crown.

 

CE Functions - Rest in Peace

 

The CE Function is an important part of HANA's past, but there is no longer a reason to use them. They are less flexible, slower, and harder to program. We expect that in a forthcoming release of HANA, they will be deprecated and eventually de-supported.


The only time I use CE Functions is when the HANA SQL optimizer has a poor execution plan. Because CE Functions and SQL don't share engines, it's possible to control the HANA SQL optimizer using CE Functions. That's an extremely unusual scenario these days.

 

Do note that the Scripted Calculation View still has its place - you can call out to Business Functions, Predictive Functions and other stored procedure objects in SQLScript. However the days of this are limited, because in HANA SPS09 it's now possible to do some of this via the Graphical Modeler. I expect by HANA SPS10 or 11, the Scripted Calculation View will no longer be required.

 

So whilst they haven't been officially deprecated yet, please celebrate their passing, and don't use them any more.

Introduction


This blog is intended to showcase a case study of massive data + report migration (AS400 -> HANA) project during large scale end to end greenfield HANA Enterprise Implementation. HANA Live RDS was already deployed in previous phases to set the basic building blocks.

 

Objective

 

Earlier this year, I was presented with challenge of migration of large number of crystal reports from legacy AS400 applications to state of art HANA enterprise.

Few main obstacles I figured out earlier while planning –

  1. Large number (~150) of crystal reports with steep timeline
  2. Complexity of logic in reports – extreme hardcoding, restrictions on AS400 ERP hard codes/sub reports (just like traditional hardcoded ABAP programs in another ERP system)
  3. Of course, we want to migrate up to 2 years of AS400 data history (all sales, finance & master data) along with reports (yayyyy!! )

 

Approaching Solution

 

During several blueprint sessions, we realized this massive crystal architecture was developed with available technology (AS400 MAPICS ERP) + crystal designer 2008. This was matured over years and if we have to rebuilt (with history data), it was going to take real long time. Also, due to audit and validation requirements, collective decision was made to keep every crystal report AS-IS and model SAP HANA backend to match existing AS400 backend.

 

Wait a min, what? Did I hear it right, yes, you did.

 

Couple of rationales behind this decision:

 

  1. Analysis of existing AS400 MAPICS ERP suggested, that there are handful of Master data, Sales, Billing, Finance tables in AS400 (~30) on which all these reports are built. These are much simpler tables than SAP and modeling SAP data to match legacy might work. (Still “might”, coz I’m sure no one has done it before)
  2. Crystal reports migration procedure lets us swap the data source of report if all technical names of fields are exact match. This will make report migration as simple as flipping switch on source systems. Minor tweak needs to be done but no need to rewrite report on new source.

 

Challenges that we envisioned

 

  1. Of course these are two different ERP systems with totally disparate table structures, seriously are we going to harmonize it?
  2. Data type mismatch – (let’s not even start about it!)
  3. How about hardcoded codes/types. They won’t even exists in SAP.
  4. Date types (AS400 – CYYMMDD – century y2k format, SAP – yyyymmdd)

HANA Complex Union - 1.png

Solution

 

UNION…UNION…UNION (of course hana node) was running in my mind when I literally started thinking of this project and that’s the recipe.

 

Prep

  1. Reverse engineered all crystal reports to documents which backend data elements are being used.
  2. Documented all relevant AS400 tables with exact technical names/ data types.

 

Design

  1. Performed massive mapping exercise to map AS400 tables to SAP tables/ data elements. Only for those fields which are used on reports (from step 1.a)
  2. HANA Live RDS was already deployed in previous phases, so majority of SAP data/structures were in place

 

Realization

  1. Created new schema to house history data (Same name to be used in dev/QA/prod for simplicity).
  2. Created AS400 table structures in new schema (WITH EXACT field technical names). E.g. below – all field technical names are driven by AS400 field names

HANA Table with AS400 structure.png

 

3. Developed HANA calculation views with union node and only AS400 data (right node below)

 

HANA Union - Sample.png

4. Once we had all HANA calculation views (with exact same technical names), crystal report migration began in parallel (even without having completed SAP data mapping)

 

5. Now comes the fun part – to actually model SAP HANA data to match AS400 structure using our mapping documents. Some of calculation views became very complex with several levels (~25-30) of joins from base ECC tables. Showing couple of simpler structures for your reference.

 

HANA Complex Union - 1.png

 

Another Example

 

HANA Complex Union - 2.png

 

Mappings in UNION

 

AS400 part matches one to one because target HANA structure matches with legacy table structure (remember we are modeling HANA for AS400 structures)

 

HANA Union Mapping - AS400 with SAP.png

 

But SAP data needs to be mapped one by one as follows:

 

HANA Union Mapping - AS400 with SAP 2.png

 

6.  We developed several mapping cross reference tables to harmonize hardcodes (e.g. GL account mapping between two ERP systems). So SAP data would give us new code, which will be translated using cross reference and crystal report continues to use old code.

 

7.  Once complete, new SAP HANA data magically appeared on EXISTING crystal reports!!!

 

Migration/Cutover


  1. Obtained historic data in CSV files and loaded using file import wizard in dev system.
  2. Reconciled data with business to obtain sign off
  3. Once data was signed off in dev/QA system, we actually transported table structuresWITH data to avoid data load times. This turned out to be great decision and saved us huge time (THUMBS UP HANA Enterprise…$$$ saved!!!)
  4. Over few weeks (couple of month end closes) most of the crystal reports stabilized. Of course, we have issues but all changes and enhancements were super quick due to HANA modeling.

 

CONCLUSION

 

  1. Overall project delivery was accomplished around in 7 months timeframe with handful of BO Crystal developers and senior HANA / BW/ ECC architects.
  2. HANA Enterprise (with real-time SLT), significantly saved implementation costs to achieve this impossible gig. I cannot imagine accomplishing such complex with BW extractors and multi providers.
  3. I don’t think from performance perspective this is great solution and HANA union may not be smart enough to just branch out in specific data set (like our multi providers); but overall ease of HANA modeling and performance of columnar storage saved the day!
  4. Clearly proves the flexibility and agility of HANA modeling to literally mold SAP to any other legacy ERP.

 

Hats off to our team and of course HANA Enterprise!! Cheers!!!

In the first blog [The Challenge of Airline Air Day of Operations Recovery (Part 1 of 2) ], I discussed the challenges airline companies face when there are disruptions to service and how operations research running on the SAP HANA database can resolve these issues. This blog examines the process involved in rescheduling passengers and crew compliantly when disruptions occur.


Sophisticated and specialized technology at work

As discussed in the previous blog (The Challenge of Airline Air Day of Operations Recovery (Part 1 of 2) ), responding to airline service disruptions can be complicated. However, using the SAP HANA database, a sophisticated data model, complex algorithms, and optimization, SAP data scientists have solved this issue – and today, you’ll see how.

With the power of SAP HANA, a centralized data hub integrates real-time operational and planning data, all business and governance rules, and extensive volumes of data on aircraft, crew, and passengers. With a single source of truth, the data is then modeled and optimized.

The result is an enumeration of new legal and compliant potential schedules for aircraft, crew, and passengers that is executed within minutes rather than hours.

 

Passengers are easily rescheduled – within seconds

When service disruptions occur, airline companies want to accommodate stranded passengers as quickly as possible. And since there are few legal rules governing the rescheduling of passengers, this part is relatively easy. In fact, there is a four-step process that this functionality automates for new passenger schedules:

 

PAXrecovery.jpg


With new schedules available in seconds, or minutes, help desks no longer have to search for option after option, so queue times virtually eliminated and passengers can get to their destinations much quicker.

 

The complexity of rescheduling crews resolved within seconds too

The process of rescheduling crews – which, of course, are mandatory for any flight – is much more complicated. There are many more regulatory, union, and structural rules to take into consideration, such as:

  • Working-time limitations for crew
  • Rest-time requirements between working periods, especially related to fatigue management
  • Licenses of pilots for certain types of aircrafts or landing at certain airports
  • Seniority rules for pilots on certain flights

 

Hence, the rules processing part of this functionality, which is an integral part of the decision making systems for the airlines, is much more complex. There is extensive pairing and rostering options that must be optimized to accommodate all of the rules for the crew and scheduled aircraft flights.


It’s easy with SAP HANA

With the power of SAP HANA, data models, algorithms, and optimization, thousands of data points on flight updates for crews are processed every second to enumerate potential schedules. A disruption manager within the software identifies any illegal routes, and then algorithms generate other legal options.

Once the ideal scenarios – that match FAA rules and are fully compliant – have been identified, then crew can immediately be notified on mobile phones, creating many less cancelations.  And all of this occurs in near real time.

The process looks something like this:

 

Crew recovery.jpg


To see the full power of this solution, you can view a demonstration of this in the video:

 


As you can see, this functionality offers a much faster solution to airlines, with quicker response to disruptions, a more optimal recovery process, and improved customer satisfaction.

Hi all,

 

My name is Man-Ted Chan and I’m from the SAP HANA product support team. I’m writing this blog in regards to the SAP HANA Lifecycle Management tool in SP8, hdblcm (SAP Note 1997526). The plan for this blog is to initially post about installing HANA (and components) using hdblcm (also taking a detour to add a node) and then when SAP HANA SP9 comes out I will upgrade using the new hdblcm that comes with SP9.

 

So where is the hdblcm tool? this is found in the install media (if this is downloaded from the SAP Service Market Place you will need to have SAPCAR on your system to un-archive this file). It can be started in two ways, by calling ./hdblcm or ./hdblcmgui.

 

Command Line Installation and Updates

With hdblcm you can run it in a batch mode, so that you can just pre-enter the values, or interactive mode (in these caps I am using interactive mode). In the following captures please note the highlight ‘Detected components’ area as I only have the server file un-archived, later I will have more files un-archived.

1.png

 

2.png

After the installation is complete I un-archived my other SAR files and re-ran ./hdblcm

3.png

We see that the system I just installed is present and available to update (indicated by the red arrow) also notice the other detected components. If you are not seeing your components you can set the component location by using one of the following options:

--component_dirs=<path1>[,<path2>]...  Component directories

--component_medium=<dvd_path>          Location of SAP HANA installation medium

--component_root=<path>                Directory including sub-directories to search for components

In this example I decide to install the HANA client

4.png

5.png

Installation and Updates Via the GUI

When you choose to run hdblcmgui it will look like this upon loading (please note I ran this post HANA installation)

6.png

*Note that there is an ‘Add Component Location’ button

I will first go over the screens for installing a new SAP HANA database

7.png

8.png

Next you would select if this will be a multi-host or single-host system (in this blog I chose to do a Single-Host System)

9.png10.png11.png12.png13.png14.png15.png16.png

These caps show just updating the existing system

17.png18.png19.png

Adding A Host Via HDBLCM

To add a host to your system you can use HDBLCM, either through the GUI or the command line (in this document I will show it via the command line)

Call ./hdblcm –addhosts=<server name>

You can optionally specify more servers, group, role, and storage partition

For information on the options available please refer to the SAP HANA Admin guide or run
./hdblcm –action=add_hosts --help

In this this demo I will be specifying the role

20.png

From this list select 1 to add the host, you will then be prompted to enter in the server information

21.png

After agreeing to continue the prompt will look like this:

22.png

 

As noted before, I'll write a second part to this as I venture to SP9

This is the first in a series of two blogs that discuss the challenges airline companies face when there are disruptions to service and how these issues can be resolved in real time.

 

When it comes to technology usage, airline companies are quite sophisticated, with solutions for information technology, customer management, finance, procurement, asset management, human resources, and back-office processes.

 

But there is one area where there is a need for more effective technology – and that’s in day-of-operations, and in particular, in the area of interruptions to service.

 

Planning for the unplanned

 

Unplanned events, such as severe weather conditions, unscheduled maintenance, safety risks, or strikes, are unfortunately a cause of frequent disruptions – which are highly inconvenient and often costly. The innovative planning and scheduling tools airlines use now typically are challenged when it comes to solving these kinds of disruptions and therefore, the airlines can’t execute recovery problems as quickly as they would like.

 

The primary cause of this dilemma is that there is too much data from too many sources – and there are too many legal and regulatory rules and regulations – which makes it difficult to have a real-time single source of truth. Consequently, airlines or air traffic controllers can’t readily identify where aircraft, crew, and passengers are right now in this moment – with or without disruptions. Without that knowledge, determining compliant new schedules during disruptions becomes nearly impossible.

 

Too much data – and too many rules

 

For instance, there is complex operational and planning data – such as flight routes, aircraft, crew members, maintenance, and passengers – that is scattered in a variety of resources. In addition, there are compliance, business, and legal requirements governing each of those entities as well. Airlines have attempted to build in-house applications and data models needed to determine optimal rescheduling options. But few if any have succeeded in building a tool sophisticated enough to handle the immense complexity that compliant resolutions require.

 

Therefore, most airline planners and aircraft controllers must rely on a manual and tedious process that is error-prone, time-consuming, and a compliance risk. While the process is being worked out by hand – which can take hours or even days – flights are delayed or cancelled, causing significant inconvenience for passengers, flight crews, and maintenance teams. Not to mention, these kinds of delays affect an airline’s brand, and they also cause a significant revenue loss.

 

Real-time visibility and transparency needed

 

The ripple effect of disruptions and the almost inevitable limited recovery experienced today is a result of a lack of transparency across an airline’s network of data. Each change (whether for a plane, crew member, or passenger) is associated with the extensive compliance regulations and business rules mentioned above. Hence, a manual rescheduling process takes on a ripple effect – each change affects another and another in the complex interwoven web of data.

 

To accommodate all these nuances, airlines need a tool that incorporates:

  • A centralized data hub that collects and integrates the data on flights, aircraft, crew, and passengers with real-time, up-to-the-moment visibility into all data sources
  • Scientific algorithms that model all the rules and regulations governing each entity
  • A flexible fast engine to process all of the above instantaneously
  • Optimized and embedded rules algorithms and processes for producing new compliant schedules

 

So what would that look like?

 

When data science and SAP HANA combine forces

 

The SAP PSG Data Science organization took on the task or resolving disruptions in airline schedules using an operations research approach. They developed functionality that uses embedded rules algorithms and processes running on the SAP HANA database to enumerate new airline schedules for aircraft, crew, and passengers in near real time.

 

There are three major components to this new functionality:

 

The collection and integration of all the data, right now, in one place. To rapidly resolve disrupted schedules, an airline needs to see comprehensive data on three pertinent dimensions – planes, crew, and passengers – all at once, as it changes in real time as. This is where the in-memory computing power of an SAP HANA database comes in. This functionality uses SAP HANA as a data hub to integrate data from various systems, including MRO, HR, flight and grounds operational, reservations, airline partnerships, and connecting flights, into a single source of truth.

DOO_Cockpit.jpg

 

The modeling of each and every existing rule. Airlines are governed by an enormous amount of regulations, such as Federal Aviation Agency (FAA) mandates for crews in regards to such things as flight-time limitations, layovers, and compulsory training. Typically encased in hundreds of pages of documentation, these rules are not readily available in any computerized system today. Then there are airline-generated business rules for passengers, such as VIP status, rewards programs, and flying preferences. In addition, there are closely monitored dictates for aircraft safety, such as mandatory maintenance schedules, repairs, and inspections. SAP has developed a rules processor with scientific optimization criteria that utilizes all of these rules and regulations and then overlaid them on the data hub to help determine real-time compliant resolutions.

 

Optimization ideal rescheduling scenarios. With integrated real-time data and a comprehensive data model, the SAP data scientists then created algorithms and an optimization engine to manipulate the pertinent information to arrive at potential solutions. Business intelligence tools are then used to enumerate a sophisticated, optimized rescheduling of aircraft, crew, and passengers within seconds or minutes. The results can be manipulated easily so that other options can be considered as well, with instant visibility into the current situation and continuous alerts, regardless of the underlying data source, affecting scheduling options.

 

This process allows for a much faster recovery of disruptions but does it work in real life? To see the full power of this functionality, the next blog

(Day of Operations Recovery in Action (Part 2 of 2)) will take a deeper look into how an actual recovery unfolds.

Hi All,

I had a requirement to find the date difference from the field

LETZTZUG of table s032 and the current date.

Here the problem was getting the current date. I got the current date using now() function in calculated column functions.

The output of the now() was in yyyy-mm-dd hrs and second. I used date(now()) and got it as yyyy-mm-dd.

With this I was still not able to use daysbetween in order to find the duedate betwen LETZTZUG  and currentdate.

Then I used replace() function to eliminate '-' from the yyyy-mm-dd. So after that I could calculate the daysbetween().

 

This will overcome the problem of not having to_dats in calculated column.

 

dd0.png

 

dd1.png

 

 

 

 

 

dd2.png

 

And we have the netdue date here.

dd3.png

 

 

I hope this will be helpful for others. Experts please provide your suggestions on this.

 

Thanks & Regards,

Vijay

Priya Ranjan

Data Platform - Why Now

Posted by Priya Ranjan Dec 5, 2014

What is a Data Platform

Databases and its capabilities are well known but are databases the Data Platform ? Simple answer to that question is No. Data Platform is more than just Database. Database is infrastructure component, used to store your data with all necessary constraints and resilience which are minimal requirement nowadays in the enterprise.

Data Platform is built on top of several components including centrally a database. It uses Database as one of the infrastructure beneath, but provides many functional capabilities on top, which can substantially improve the business functionality in many of the applications which mostly and finally hit the Data Platform in the end.


Why Data Platform and Why Now
Whether it be banking, insurance, utilities, pharmaceuticals or any other domain, data is the net asset of the IT systems in any organization. Many would argue that it is the applications which provides business functionality and gives data the contextual meaning by molding the raw data in a domain which makes it relevant. But the truth of the matter is that in a Data Platform, we push certain aspects of the applications to the Data Platform itself making the application lighter and far more functional. This would mean that there is blurring of lines and a fusion between applications and the Data Platform.

For several years and reasons this idea had existed but did not get adopted en-masse due to the following factors in my opinion.


Database’s unfulfilled promise to grow up as a Data Platform

Initially Databases were very proprietary, developed certain extensions to query language which were not portable and did not build higher order functionality which applications could leverage. They did not support all data types, data stores and database types. They did not provide granular security features, high ingestion mechanism, self data cleansing pipeline, event sourcing, capability to build Expert Systems ( Rules )  machine learning         capabilities and validations. These tasks then shifted to other platforms and applications and whole industry was created to fulfill this vacuum.     

 

Databases and storage were very expensive

Earlier databases and storage were expensive, the technology was limited, and this put several financial and  resource constraints. Application server were cheap, could be replicated and could do lot of further data processing if required, and could scale and could be replaced easily. This led to databases being a simple( read dumb ) data store and not data platform that it should have become.

Not scalable, parallel and distributed to handle complex business

Databases were not scalable beyond a certain limit, were not distributed, were hard and expensive and took years to optimize, applications on the other hand could easily be load balanced, clustered were cheaper to create and manage and even throw away if there were better solution.

Refactoring had enormous impact.

Schema changes had a wide impact, no refactoring was possible as applications had SQL littered all over them in string formats, and there was no possibility of  refactoring. Read and Write operations in the CRUD applications happens with SQL on the same set of tables, any changes to these table schema impacts the applications.

Most applications were CRUD applications and all necessary data was brought to the user interface, interlinked and contextualized in the UI and decision and validation were made and data was fed back to database after that.

Vendor Lock

Database vendors used clever techniques to lock their customers in early days, this created mistrust among customers and developers, to use minimal feature set of the Database and write most of their data munching code inside the application itself. This is true somewhat today as well.

 

The big question is why are we debating and discussing the Data Platform NOW. Did we suddenly discovered the awesomeness of SQL  and that data is our core asset in the enterprise. What happened that suddenly people talk about data platforms and big data all the time in every place and that analytics has become the de-facto hype.

In my opinion two things contributed vastly

  • Economics

Economics always has a big impact on the decision related to technology. Many companies could not afford the IT infrastructure that they can do now. After all, your smart phone now is as powerful as a fastest computer 15-20 years back. It was said that developers will always be expensive than the system, and hence developer productivity is of significant importance as systems becomes cheaper over time and they pay out for themselves in a very short span of time.

With the advent of cloud, you can rent the IT infrastructure as each layer of hardware, platform and software. This makes it easy for many organizations to push some of their auxiliary applications into the cloud. (I’m of the opinion that core data should be managed in the private cloud and not in the public cloud, applications in particular external facing, mobile applications should be managed in public cloud, for now at least).

 

  • Internet

Internet is bringing in new economies, and this is evident all around us. It is changing the access mechanism and shaping customer behaviour significantly. Slowly but surely all business and services would be accessible via the internet and people in large would use internet ubiquitously to access services. Even machines would provide/exchange their information to perform maintenance and bring in the kind of efficiency not seen before. (In my opinion I don’t think a toaster or fridge communication over the internet is IoT, for Smart Meters, Oil & Gas, Water Utilities, Aviation equipments, public health systems, smart cities, constitute some of the ideal use cases for IoT)

 

Internet also brings the data deluge, and the question of privacy which is something that all companies should think deeply on this and try to create anonymization and security of data a top priority.

 

But still question remains as to why we need a Data Platform. We still have capabilities to manage large amount of data for our applications and analytics and we can go along the same lines scaling our system to incorporate the new economic phenomenon observed earlier.

 

The question is not of technology but of strategy. In order to stay competitive and relevant you need to have a full view of enterprise at the moment and the granularity of information should be such that decisions can be made swiftly and in many cases automated. Automation of lower level tenets in business is central to organization growth in any sector, as consumption increases and brand loyalty becomes fickle with one mistake. Automation would become the key to innovation and customer satisfaction as the customer expectations are changing and workload on the employees are increasing.

1.PNG

The data silos in organization happened because each application in your organization works with its own database, with its own schema structure. You do not have a full holistic view of your company and therefore you ETL the entire data periodically to a warehouse to be analyzed later. This periodic batch ETL of data misses out some of the events and real time proactive reactions that an organizations could have had to the events from its most valued customer or new customer. BI tools then munch on this data to produce reports which have very little relevance with progression of time.  As you not only need data but the whole situational context under which the events happened to learn from your data. Otherwise the data would not speak to you and would be irrelevant as the situation would be very different in present and in the future than in the past.

 

This all happened because we did not have the technology at our disposal. We had various components, but then it got IT overwhelmed with its own myriad of complexity of interlinking these components, which it was unable to manage and maintain and hence retracted to a sulking mood not to innovate, as it would bring in more and more complexity. Why do you think mobile being such a hot requirement has not taken off in the enterprise? It should have been here yesterday.

 

2.PNG

Conclusion

Complexity kills. Complexity sucks the life out of users, developers and IT.  Complexity makes products difficult to plan, build, test and use.  Complexity introduces security challenges.  Complexity causes administrator frustration.

                                                                                                                                                                           Ray Ozzie                                                                                                                                                                          

As we did not have a coherent data platform, the business and technological requirements could not be encapsulated and hence leaked out in myriads of application suit, integration suite, complex event processing, business intelligence and warehousing. The whole discussion and effort shifted to managing these systems then business requirements.

 

As we were limited by the constraints of infrastructure, we could not build a single holistic view the entire enterprise, so we split the functionality into several of applications and then did the plumbing to make it all work together. This lead of multiple version of truth, no business process management strategy and had multiple batch shipping data which should have been worked on yesterday.

 

But now we entering a technological phase where distributed computing, which was only in the realms of major research and university is becoming a universal way of doing compute. We are entering the age of architecture where we can replay data to the point of our desire, we can pause, rewind and forward into the future with predictive analytics. Our applications would not be doing just the dumb crud applications ,but would become smart applications which would have highly contextualized information at its disposal to automate or assist user to make critical decisions.

 

Many organizations can choose to sit on the sidelines and wait till the change is thrusted upon them and complexity overwhelms them to the point of paralysis, or they can start rethinking and building a data platform for their own organization and re-invigorate their organization with automation, learning and adaptation.

There are 2 ways to enable the Search features in SAP HANA:

 

1)     Create Attribute Views in HANA and Consume them on HANA Search UI Toolkit using the SInA API.

 

For more info please visit this blog Create your own application site using SAP HANA UI Toolkit

 

2)      Starting with HANA SPS9 we can also create CDS models with Search features enabled by using some annotations and consume them on the

search UI (currently SHINE Application) using OData services.

 

CDS1.PNG

 

In this blog we will discuss about how to enable the search features using the CDS models.

 

Procedure:

 

-     Defining tables and views using CDS annotations

-     Full-Text searching in OData

-     Consume the CDS views in the Search UI

 

Step 1:  Defining Tables and Views using CDS annotations:

 

We create tables and views using .hdbdd files in an XS Project.

The XS Project should also contains the other files which are mandatory .xsaccess, .xsapp and an .xsodata file.

 

For more info regarding the CDS is available in: http://help.sap.com/hana/SAP_HANA_Core_Data_Services_CDS_Reference_en.pdf

 

Currently the below Search Annotations can be used to create CDS tables:

 

@SearchIndex.text - Enable text search on an element of a CDS entity

@Search.textAnalysis – Enables the Text Analysis on the columns

@Search.fuzzinessThreshold - Enable fuzzy search for the given element

 

These Search Annotations can be used to create CDS views:

 

@Search.defaultSearchElement - Element is used for freestyle search

@Search.fuzzinessThreshold - Enable fuzzy search for the given column

@Search.ranking - Define the ranking weight for the column

@EnterpriseSearch.usageMode - Define column as auto facet column

@Search.highlighted - Element is used for highlighting the search results

@Search.snippets - Element is used for Snippets functionality in the search results

 

This is a sample CDS table. For more info please go thru the CDS Documentation.

 

CDS2.PNG

 

Sample CDS view -

 

CDS3.PNG

 

Once we have all the files in place we can activate the project which should create the tables and the views which we have defined in the .hdbdd file.

 

These CDS views which has the  Search features enabled can be consumed in the applications.

 

Currently we have an App in SHINE “Full-Text Search with SAPUI5, ODATA and CDS” which consumes the CDS view.

For more info on SHINE please refer the documentation: http://help.sap.com/hana/SAP_HANA_Interactive_Education_SHINE_en.pdf

 

This is how the UI looks like. You can see the data from a CDS view  which is created by joining 2 CDS tables with Search Annotations.

The user is provided with a Search field to find products based on the product text. He or she can also refine the search by selecting the facets that are provided in the Facets column.

 

        CDS4.PNG

 

In the same way the users can also create their own Search UI’s using SAPUI5 and OData functionalities for Searching / Filtering the data.

what is htap large

 

HTAP stands for Hybrid Transaction / Analytical Processing — and it’s the future of business applications.

 

The term was coined in early 2014 by analyst firm Gartner to describe a new generation of in-memory data platforms that can perform both online transaction processing (OLTP) and online analytical processing (OLAP) without requiring data duplication.

 

For the last few decades, it has been considered best practice to keep operational and analytic systems separate, in order to prevent analytic workloads from disrupting operational processing. In order to achieve adequate speeds for analytic queries, data has typically copied from one or more operational systems to a Data Mart or Data Warehouse.

 

This approach was challenged in 2009 with the publication of a white paper by Dr. Hasso Plattner entitled A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database. The paper argued that advances in in-memory technology enabled a new approach for standard business applications where data could be stored just once without compromising either transactions or analytical workloads.

 

HTAP systems are now commercially available, from major vendors such as SAP with the SAPHANA platform, and smaller startups such as VoltDB, NuoDB, Clustrix, and MemSQL.

 

Gartner believes that HTAP systems can address the four major drawbacks of this traditional approach:

First, for analytics in HTAP, data doesn’t have to move from operational databases to data warehouses. Secondly, transactional data is readily available for analytics when created. Thirdly, drill-down from analytic aggregates always points to fresh HTAP application data. Finally, you eliminate or at least cut the need for multiple copies of the same data.

Gartner emphasizes that in-memory databases are very different from just using existing databases with Flash disks or in-memory caches. To get the full benefits of using in-memory, databases need to take an entirely different approach from existing disk-based systems. This may be why, even though most of the major database manufacturers now offer in-memory processing options, these are typically only for analytics only, and require a copy of the data. This in turn means there is a delay between when a new transaction being written to the database and when it is available for analysis — i.e. they are not HTAP systems.

 

While some industry analysts continue to assume in-memory systems are a “luxury” that is only justified in situations that require real-time processing, Gartner believes HTAP approaches are more cost-effective, with the higher hardware costs more than outweighed by lower costs of operations and greater flexibility. These seems to be confirmed by the increasing numbers of organizations that have announced they aremoving their business applications to in-memory HTAP systems

 

After extensive research, Gartner has concluded that industry disruption through HTAP technology is set to grow:

Hybrid transaction/analytical processing will empower application leaders to innovate via greater situation awareness and improved business agility. This will entail an upheaval in the established architectures, technologies and skills driven by use of in-memory computing technologies as enablers.

Many Big Data architectural approaches such as Data Lakes continue to separate transactional and analytical activities. The rise of HTAP applications makes it clear that such approaches are only part of an overall information architecture strategy.

 

[This post was first published on the Business Analytics Blog]

OData Protocol

 

OData is an open standard protocol that allows service providers to define access to their resources in a standardized manner. The service definition is available via the service metadata document (EDMX).

OData allows resources (Entity Sets) to be identified using Uniform Resource Identifiers (URIs) and defined in an abstract data model.

 

E.g. If the user wants to access Employee with primary key as 1, the details of the URI would be http://<server name>/<service name>/Employees(1) with the HTTP GET operation

 

The data format for exchange is ATOM XML or JSON

 

What makes the OData protocol very flexible is the availability of associations which defines the relationship between Entity Sets. For example, Let us say when retrieving Employees you would also like to get information about their HR and if this (HR) is defined as an association then this is possible in one single call and also directly by modifying the URI appropriately.

More details about OData protocol at http://www.odata.org/

 

OData Adapter in HCI

 

With the OData Adapter available in the Receiver channels of HCI you can connect to any OData service provider and perform the required operation required for your integration scenario. The HCI platform provides full pipeline capability also when using the OData Adapter. When data is fetched from the OData Service either in ATOM-XML or JSON format, the Adapter automatically converts this to XML and the payload can be used with any of the steps (e.g. mapping) available in the platform. Similarly, when sending data back to the OData service via the PUT or POST operation, the data is converted back from XML to JSON or ATOM-XML format automatically by the Adapter.

The OData Adapter in HCI supports the v2 version of OData Protocol.

 

OData Adapter Settings

 

1. OData Adpater General.jpg

2. Adapter Specific.jpg

Some key features

  1. The Adapter currently supports both Basic and Certificate Based authentication to the OData service.
  2. It allows you to specify a Page Size. This feature is useful in case the OData Service that you are connecting to has problems with large payload size. Using Page Size the Adpater fetchjes data in smaller chunks from the OData Service (Internally uses the top and skip operations)
  3. For Integration developers who do not know the complete symantics of OData protocal or the OData service, there is a simple easy-to-use Operations Modeler. The Operations Modeler parses through the service definition in the edmx and allows users to define the URI and operations via a wizard. The Modeler also generates an XSD file, which is the metadata of payload received or to be sent. This XSD can be used in further steps like mapping.

 

OData Operations Modeler

 

  • Connect to the OData service to download the edmx OR upload the edmx of the service to src.main.resources.edmx and use it in the Modeler

     3a Operations Modeler Connect.jpg

  • Choose the Entity on which you would like to perform the required Operation. In case you would like to also select the Associated Entities via then choose the levels.

          3. Operations modeler Enityt select.jpg         

  • Decide on the operation and fields that you need to work on.

     4. Operations Modeler - Operation.jpg

    • In case of Query operation you can also decide to use top and skip
    • In case of Read, Merge and Update the mandatory Key fields are automatically populated.
    • In case of Create, Update and Merge operations the Adapter also supports batch requests. OData batch request allows you to send multiple records / operations in one single request as batchsets. The Operations Modeler ensures that the XSD that is generated is defined in a format that allows sending of batch records. For more details read Note 2031746

 

  • In case of Query Operation you can filter and sort the dataset

          5. Operations Modeler - Filter.jpg

          6. Operations Modeler - Sort.jpg

         

          The filterable and sortable fields are prefilled thereby allowing to simply model the operations.

 

  • Clicking on finish generates the XSD file which is the format of data exchange and this can be used further in steps like Mapping.

 

 

Note: You can access some test services from http://services.odata.org/

 

OData Support in SuccessFactors Adapter

 

SuccessFactors also exposes API as OData services apart from SOAP based services. The SuccessFactors Receiver Adapter is enhanced to also support the OData based services.

 

7. SFSF OData General.jpg

 

By using the OData V2 proptocol as the Message Protocol in the SuccessFactors Adapter you can connect and work with OData services exposed by SuccessFactors. SuccessFactors OData follows the same open standards thereby all of the information mentined above is also applicable for SuccessFactors OData Service.

 

SuccessFactors OData service does not have support for batch operation. Instead what is supported is a custom OData import function called UPSERT. The UPSERT operation takes care of creating or updating the records without you having to do this explicitly. The UPSERT operation also allows sending of multiple records at the same time.

 

9. SFSF OData - Upsert Modeling.jpg

(see Comparing the Data Warehouse approach with CalcViews - Overview for the initial thoughts and the summary)

Order Entry Amount per period for a customer

Since Hana Live is the new kid on the block let us start with this. I want to know the sales order revenue per period for a given customer. The Hana Live View best suited is "_SYS_BIC"."sap.hba.ecc/SalesOrderItemNetAmountQuery" containing all the fields we need. Also, the underlying tables of the used system are quite large, 1.7 million Sales Order rows (VBAK) with 4.7 million Sales Item rows (VBAP) across multiple Clients, we filter on one Client which has 1/3rd of the data.

 

The query time is 0.3 seconds – perfect. Why would I ever want to copy all data to a Data Warehouse for such a query?

 

Example 1-1.png

 

Order Entry amount per period for all customers

In the previous query data from a single customer was selected only. What about large amounts of data, say total revenue of all customers for 2012 and higher?

Query time is again in the sub seconds, around 0.5 seconds in fact. Amazing, isn’t it?

Example 2-1.png

 

Order Entry amount for all periods, all customers

How about the complete data, no Where-clause at all?

This query leaves mixed feelings on my test system. Obviously that the ERP system is missing a conversion rate from RMB to EUR for the date 2011-01-25. Because of that the entire query fails.

Example 3-1.png

It seems to be that single day only, as when excluding it, the query returns data.

 

Example 3-2.png

Performance wise we are getting closer to 1 second with this. Still good but if we would go for more complex queries it might be hard to stay below our goal of sub-second response times.

What can we do now? We just learned that, in exceptional cases, a currency rate might be missing. We could call up the IT department and tell them to upload the missing conversion rate. But actually this is not that uncommon, maybe the Calculation View should deal with that case itself and use the previous date’s rate? So we need to make the transformation, and therefore the Calculation View, more complex. And since the logic of a View is applied whenever somebody does query the data, performance will go down. Maybe just a little, maybe rendering the View performance inacceptable.

In a Data Warehouse these transformation would be performed whenever the changed data gets loaded. So the transformation would be done just once and the time the transformation takes is less important as no one is waiting.

 

Conclusion 2

When using Views you are limited in the amount of transformations. It is obvious that the more complex a View gets the longer it will take to return the data to the end user. If, on the other hand, the transformations are done upfront, while loading the Data Warehouse, the ETL run might take a second longer due to the additional logic but it would not have impact on the query performance.

Big question would be what kind of transformations are common and their impact on the performance. So let us have a look at that in the next example.

 

Joining master data

The most common transformation is the join. Actually each Calculation View itself requires joins of various tables already, that is the reason the queries above took so long. Now we want to add more tables, e.g. we change our query so that it does return the sales per material number and text.

Example 4-1.png

Just because of one additional join the response time grew by 60%, from 0.862s to 1.117s.

Adding Customer and Customer Address did increase the response time to 1.4s.

Example 4-2.png

Unfortunately you often require joins even if no data is selected from this table, because the table has the key to the next table. If for example the user wants to see the amount per customer region he would need the VBAP table as this contains the amount. From there join the VBUP table to get the customer id. This is can be translated to an address id using the KNA1 table. With the address id stored there you can join to the ADRC table containing the region code with which the T005U table can be read to get the region name.

VBAP->VBUP->KNA1->ADRC->T005U

Four joins are required in that example just to get to the region name. In a Data Warehouse you would probably store more keys in the sales line item table, e.g. the customer region key. As a result you would simply join the enriched VBAP table directly to the T005U table. Or add the address with all its sub information directly to the customer record.

Downside of that approach would be that whenever a customer gets a new address id assigned, many rows need to be updated. The order entry system would be slowed down if suddenly all the orders of the customer need to get updated, but in a Data Warehouse doing that when loading the changed data into the Data Warehouse is no problem.

 

Conclusion 3

Joins are surprisingly expensive operations. Does not matter if the View itself requires many tables or Views are joined together. It is the number of joins and the amount of data that matters. One of the major advantages of a Data Warehouse is the reduced number of joins required in queries.

 

Searching for data

A simple search might be based on a primary key, e.g. we search for one particular sales order.

Example 5-1.png

Why does this take 0.6 seconds??? In the first Example the query did have a where clause on the SoldToPartyName and that completed in 0.266s! Can anybody explain?

How does a search look like in Hana?

1. Go through the ordered list of unique values of a column and find the index number for. (Key Compression)

2. Scan through the column values and find all occurrences of this index number. (Scan)

So there is one important difference between the Example 1 query and this search on the order number: The data distribution.

While 1.6 million rows in the SAP Client represent 636’000 sales orders, they all belong to just 117 customers. As a result the first step, searching in the unique values has to deal with a totally different amount of values.

Example 5-2.png

But that cannot be the full explanation. In the ordered value list Hana can search for a given value very quickly and even if not, it is just 600k records, that’s nothing for Hana.

What datatype is the SalesOrder column? It is a varchar containing order numbers like ‘00001000’. The where condition we used was to compare that with an integer number. Therefore, in order to find the matching unique values, Hana cannot narrow down on the value in the ordered list, instead it had to take every single value, convert it to a number and check if the number matches the condition. That takes a few milliseconds.

Simple to proof, let’s compare with the actual string value instead: 0.088s.

Example 5-3.png

In essence this example here boils down to the question of how often does the View provide the real data, how often does it return data some functions had been applied to. If you filter on the real data, using the real values in the condition (SalesOrder = ‘0004113377’), then Hana can utilize its unique value index, assuming the optimizer is clever enough. But if the column filtered on is built in on the fly, e.g. Concatenate CustomerName with CustomerID into a string like ‘Ama Inc. (00003523)’, and a filter is applied to this, then the slower approach has to be taken, in worst case the query has to be executed on all data in the table and filter later.

In a Data Warehouse these data preparation steps, convert the datatype, beautify strings, etc would all be done during the ETL phase and hence the target database would store the already converted data. Hence the search performance will be good in all cases.

 

Conclusion 4

Searching data coming from Views can be expensive. Sometimes as expensive as reading the entire tables even if a single record is found at the end only. On the contrary, in a Data Warehouse all columns of the view are stored in a table and hence it is a normal search using constants which Hana is excellent at.

 

 

Data consistency with foreign keys

Let’s have a look at our view, what it does actually.

Example 6-1.png

It is a tree reading a table at the bottom and then joining new data to it and using projections. While many joins are inner joins, meaning e.g. a sales order line always does have a sales order header, quite a few relationships between tables are not enforced and hence require an outer join. Like Distribution Channel. Yes, a sales order can have a distribution channel but does not have to.

Executing a query that does include an outer join is usually more expensive. We can test that by simply selecting the distribution channel as well: 1.6 seconds.

Example 6-2.png

In a Data Warehouse all foreign relationships are made valid during the load. A sales order has no distribution channel? Then put the distribution channel “?” into the key column. A sales order has a value in the distribution channel column and there is no matching partner in the distribution channel table? Add a row into the distribution channel table with all values being a default, e.g. distribution channel name is “?”.

The extra time for straightening out the foreign key relationships is spent during the load of the record instead of at every single query execution.

 

Conclusion 5

With Views all foreign key relationships need to be dealt with at runtime. Is it impossible a key does not have a partner in the reference table? Can the key be null? What should happen if? In a Data Warehouse these issues are identified and corrected at load time, greatly speeding up the query performance.

 

 

Dealing with doubtful data

What should happen when entries in the ERP system are not correct? The classic answer is: we want to show the incorrect data so users have the motivation to correct it in the source system and hence help even the ERP processes. Sounds logical at first but we have seen that a missing currency conversion for a single day can render an entire query to fail with an error. In this case the user did not even see the wrong data, he got an error and that’s it. He cannot even fix the error himself but depends on somebody else.

The other problem we noticed already is that dealing with such problems inside the view costs considerable amounts of execution time. Not for the one erroneous row but because the query needs to be amended with logic dealing with such error just in case.

Another example is when the transformation logic itself has a problem. In the past the revenue was calculated by quantity multiplied with price. What else should it be? Yesterday a service company got acquired and now we get sales orders for service contracts, monthly payment of the service fee until the order is cancelled. Now the price * qty formula does not work anymore. This might not be noticed immediately, the query returns data and the only way to fix the problem is by changing the Calculation View.

But then there are true data errors, e.g. price is 100USD instead of 1000USD. That is something the system cannot fix and for sure it would be a good idea to correct the price in the ERP table, not manipulating the data somehow.

Then there are other data errors that can be corrected automatically, e.g. a customer has a male first name but its gender is female or missing? If we follow the rule of “data should be fixed in the ERP system”, then the user noticing the problem would send an email to the sales department, asking to provide the proper gender. As a result the sales department gets flooded with these little issues they do not even care about. All they care is the order can be shipped. Hence asking them for these tiny fixes won’t work. We can run an automated process trying to find inconsistencies and update the ERP table automatically. Sales department will not like that either when some algorithm updates the important customer record or breaks something, e.g. letter head reads “Mr. Chris Johnson” offending Christina Johnson. Dangerous.

And most important, quite often the Calculation View user does not even know he did something wrong.

A user might want to find out how many customers are in Beijing, China. Simple query: There are 15 customers.

Example 7-1.png

He would now draw the conclusion that a marketing campaign in this city does not make sense and spend the money elsewhere.

What the user did not notice however, as it is not obvious, there are more rows with the CityName = BEIJING.

Example 7-2.png

Just because the user did not know something, he has drawn the wrong conclusion. Is it the sales department’s fault to enter the CityName with two different spellings? Maybe. Will they correct all records? Unlikely. The shipments and the billing documents reached the destinations, from their perspective all is good.

But that is not the end of the story, the user should have actually executed below query to find 8 more customers in Beijing, the native spelling of the city name.

Example 7-3.png

But even that is not all as the next query shows.

Example 7-4.png

Four records have the wrong spelling, Bejing instead of Beijing. And 24 records have the correct spelling but the wrong country of DE. By looking at the street name, the postal code, the language flag you would know for sure that these are no German customers.

While for the latter case we can assume that this truly is a data error that should be corrected in the source, the spelling error is not that obvious. Of course it is wrong but does the sales department care enough to fix it?

There is a great danger in such wrong data. As long as it is obvious it is no problem. But an error the user does not recognize leads to wrong conclusions. As consequence the entire reporting solution is not trusted anymore and then the project as such is considered a failure.

In fact that is one of the main reasons why Data Warehouse projects fail, the queries produce wrong data. Not because the ETL logic is wrong, because of the source data. Just by performing the transformations in a Calculation View now, neither will it get easier nor is the problem made go away. A service order is still producing the wrong amount figures. A user not knowing the various spellings of a city name is still selecting a fraction of the data only. Actually it is worse as in a Data Warehouse additional options are available.

  1. Since all changed data gets processed, data consistency checks can be made. Are all sales order types the ones we expect? Is the data consistent in itself, e.g. country – postal code – city – street?
  2. If problems are found we can notify the source system users. To some degree the source system can be corrected automatically, for those records there is no doubt taking away lots of manual work from the people maintaining the data.
  3. Data can be enriched, e.g. in addition to the ERP city name an additional column with the standardized city name can be added and that is mostly.
  4. As the logic is executed for changed records and only once, way more sophisticated checks can be implemented.

 

The result would look like here where an additional column CityNameStandardized had been added. There the user can be certain only one spelling is found for each city. This does not fix bad input data somehow magically but

  1. Due to the standardization process faulty records have been identified and can be brought to the sales department’s attention. (Row 4 and 14)
  2. Valid records have a single spelling only (e.g. row 21)

Example 7-5.png

 

Conclusion 6

While showing incorrect data in the source to make the problems visible has its merits, in reality this often leads to users not trusting the reporting solution any longer. Being able to explain each case does not help much when users do lose their faith in the solution and hence are not using it anymore.

 

Historical data

Imagine a query was executed yesterday, sum of revenue per country in 2011. The result was this:

Example 8-1.png

Today the same query is executed but the result is a different one: Suddenly the new amount for the past did change, what belonged to Germany before is now in Austria!

Example 8-2.png

All is correct, the company with that large revenue did relocate yesterday to Austria. Or in technical terms, the customer with the CustomerNumber 0000000187 got a new Address ID and this is an address in Austria. As said, everything is perfectly good, but is this what the end user wants to see?

In a Data Warehouse one of the important decisions is where to keep history, using which of the many options, where the current data is requested and where both, history and current data, is needed.

Using above example, in the Data Warehouse are two AddressIDs. The one was added to the fact table directly, so it contains the AddressID of the day of the booking. This is used for the historically correct query.

Example 8-3.png

And then the customer has an AddressID,  that is the current value.

Example 8-4.png

 

Conclusion 7

Reports about past data can change at any time because most tables do not contain the full history in an ERP system. Neither needed nor wanted there. But for the user it can be a great surprise and problem if a report about past years does show other numbers, just because the master record got updated. In a Data Warehouse we have the same problem, but there we have options to decide for each data if and how to keep the history. It needs to be factored in when building the Data Warehouse however.

Actions

Filter Blog

By author:
By date:
By tag: