Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
wilson_kurian3
Advisor
Advisor

There is some confusion around the different options available for managing data in BW. Hence I am writing this to ease that confusion and hopefully achieve a clear understanding of what options are available and which are best suited for what purpose.

In a typical non-HANA environment most customers will retain all of their data in SAP BW and some will retire using tapes or disk or using a near-line storage solution with a secondary DB.

When it comes to running SAP BW on HANA, the cost to putting all data in RAM in HANA can be high if the volumes are large. Moreover, not all the data needs to be in-memory because typically in an organization only 30-50% of the entire BW data is really used very actively for reporting and other operations and hence they are the ideal candidates to fully utilize the in-memory capabilities of HANA. The other 50-70 % portion of the data is infrequently accessed and hence can be managed in a low cost plan.

SAP BW on HANA offers a number of ways to manage these different data temperatures so you can achieve an overall lower TCO for your investment. For customers this becomes an interesting area because it encourages the adoption of an archiving policy which when managed and maintained efficiently can limit the need to buy more HANA and thus saving heavy Opex costs.

Broadly there are 3 types of data temperatures -

HOT DATA


This is the area where 100 % of your PRIMARY IMAGE DATA is in the HANA in-memory space (RAM) and is instantly available for all operations.

In the BW world, this is typically the InfoCubes and Standard DSOs as they constitute the reporting and harmonization (EDW) areas respectively as show below. They are very frequently accessed for reporting and harmonization purposes and hence is the ideal candidates for being fully in-memory and to fully benefit from the HANA capabilities.

Although the frequency is fast, the data that is typically accessed very frequently for reporting purposes is between 2-3 years old. Hence this portion of the most recent accessed information is the real hot data that needs to be in-memory all the time to deliver top level performance.

The older data (typically beyond 3 yrs.) are rarely accessed for reporting but are still required for to be retained for regulatory and compliance purposes. Hence these older data can be safely archived to a very low cost plan using the COLD DATA management option using SAP IQ as explained in the next section.


The data in the PSAs and w/o (write optimized) DSOs constitute the staging area and corporate memory. Although they require frequent access, they tend to be used primarily for non-reporting purposes, i.e. for data propagation and harmonization. Hence they can be moved to a WARM store area, which is explained in the next section.

The below diagram shows the areas where the HOT, WARM and COLD concepts will apply in a typical SAP BW EDW architecture.

Access

VERY FREQUENT OPERATIONS, that run every few seconds, to every minute to every hour

Response Time

REALLY FAST, Fully in-memory

Use case

To provide fast access  - To queries, data loading, data activation, data transformation and data look-ups

Likely candidates

RECENT DATA from InfoCubes, Standard DSOs, Open DSOs and All Master Data and Transformations and related look-up DSOs.

COLD DATA


In the context of this document I am only discussing SAP IQ as the cold storage, whereas with BW there are other certified partners who are providing Near-Line Storage solutions such as PBS Software and DataVard. You can look up for “NLS” from the partner site at - http://go.sap.com/partner.html

This is the area where 100 % of your PRIMARY IMAGE DATA is in a SECONDARY DATABASE (ON DISK) and the response is slightly slower than HANA but still offers reasonably fast READ ONLY access to data for reporting purposes, as if they were in one database.

In the BW world, the standard DSOs & InfoCubes constitute the harmonization and reporting layers. But typically only the last 2-3 years of data is the most frequently requested. The older data (typically beyond 3 yrs.) are very in-frequently accessed for reporting but are still required for to be retained for in-frequent reporting or regulatory and compliance purposes. Hence these older data can be safely archived to a very low cost plan.

This is where the NLS comes into play. Keeping the existing models and architecture the same, you can remove the older sets of data from these Infoproviders (typically slicing the data according to time dimensions or completely moving the entire data) out from the primary HANA database and move it to a secondary low cost/low maintenance IQ database. The READ access to IQ NLS is in most cases is much faster than READ access to traditional databases. For customers running BW on xDB and using IQ as NLS, the NLS DB actually turns into an ‘accelerator’ and provides much faster response times than the primary database.

The NLS4IQ Adaptor in SAP BW offers tight integration between SAP BW and SAP IQ, such that all data management, retiring and control processes can be done through SAP BW using the Data Archiving Process (DAP). A lot of new enhancements have been recently added with the BW 7.4 SPx releases that help to manage the entire end-to-end archiving life cycle process in a much more simpler and efficient way.

Talking about SAP IQ, it offers columnar tables for faster read access, upto 90% compression and runs on a conventional hardware, thus offering overall lower TCO benefits plus it is a highly mature database with a large install base for the past 15+ years. Hence it is a trusted environment to retire old data as a low cost/low maintenance DB option but still have all the benefits of accessing it in near real-time whenever needed or requested.

Also for historical data the SLAs are usually not the same as the high availability data and hence the NLS process helps by moving the bulk of the inactive data out of the primary database to a slightly relaxed SLA environment. Secondly what NLS is providing is an on-line archiving solution, so as the volume grows and data gets older, they can be seamlessly moved out of the primary HANA database. This way you can reduce the OPEX costs by significantly reducing the need to buy more HANA, thus reducing the TCO of the landscape dramatically.

Access

SPORADIC, typically data that is older than 2-3 years but is still required for reporting purposes either regulatory or statistical or compliance.

Response Time

TYPICALLY 5-10 % less than HOT store.

Use case

This is used for data retiring purposes where you REMOVE part of your DATA (HISTORIC DATA) from your PRIMARY STORAGE and MOVE to a low cost database, typically generating an archiving scenario, but still making the data available anytime and anywhere with near real-time access as on request.

Likely candidates

HISTORIC DATA from InfoCubes, Standard DSOs, and SPOs.


WARM DATA


This is the area where the PRIMARY IMAGE DATA is in the DISK storage of HANA Database instance, but is always available on request. Using this you can manage your LESS RECENT DATA and LESS FREQUENT DATA more efficiently within the HANA database such that data is instantly available for READ, WRITE, UPDATE etc (all operations), but still offers the lower TCO benefits.

In the BW world, PSAs and W/O Optimized DSOs constitute the staging area and corporate memory area. The value of the data in the PSAs is good as long as it is the newest data. Once it is loaded to the upper level targets then the value of that data diminishes and is only required if there are discrepancies in the report results and a trace back/reload is required. Although some customers do maintain a regular housekeeping, the PSAs persists the data for a few days to few weeks to few months, depending on the SLAs and policies. Hence their size can grow very quickly thus blocking a lot of space and memory which otherwise could have been used for other important processes. Similarly with corporate memory, they are primarily used to support the transformations, harmonisations, reconstructions etc.; hence their usage is only required when such activities are taking place.

There are 2 options to do the WARM concept –

1. Non-Active Concept

The Non-active concept is available since SAP BW 7.3 SP8 and is primarily used to efficiently manage the available HANA memory space.

This concept primarily applies to PSAs and W/o DSOs. The PSAs and W/O DSO are partitioned by data request which means that the complete data request is written to a partition. Once a threshold value is exceeded for the number of rows of a partition then a new partition is created. The default threshold value for PSAs is 5 Million lines and for write-optimized DSOs it is 20 Million lines.


Using the non-active concept the PSAs and W/o DSOs can be classified as low priority objects, so whenever there is a shortage of memory, only the older partitions containing the inactive data are quickly displaced from memory, thus making room for other high priority objects/processes to use the freed memory. The new/recent partition of the PSAs and the w/o DSOs are never displaced from memory and they always remain in memory for operations that are required as part of the data loading process.


Although the concept can be applied to InfoCubes and Standard DSO, but it is a HIGHLY UNRECOMMENED OPTION. Please check SAP Note 1767880. Since cubes and standard DSOs are not partitioned by request, the concept of making them low priority objects and displacing and reloading them does not work efficiently in these cases. As they can hold large volumes of data, whenever a load or activation is requested the entire table has to be brought back to memory and this will result in drop in performance. For these Infoproviders, it is ideal to keep either ALL of their data in HOT OR to keep the newer data in HOT and move the older data sets to a COLD STORE like IQ using NLS concept.

Access

MEDIUM FREQUENT DATA

Response Time

REALLY FAST, if all partitions are in-memory.

If the data is displaced from the partitions and require a reload back to memory then there is considerable lag depending on the volume of data and the infrastructure strength. This is one of the key reasons why non-active concept is not a highly recommended in a very active data warehousing solution, as pulling the data back into memory from disk has negative implications in performance.

Use case

To efficiently manage the low value data in the HANA in-memory space for PSAs & w/o DSOs, and retain the available HANA memory footprint.

Likely candidates

PSAs and W/O DSOs only.

Some considerations -

* Non-active concept is not a way to retire or store your older data into a low cost plan, but rather it is a way to more efficiently manage the limited available memory so that when the higher priority objects/processes require memory the lower priority objects are displaced and memory is made available to do higher priority tasks.

*The non-active concept works only when there is a memory shortage. This means that the entire PSA & w/o DSO will always be in-memory unless there is a shortage during which ONLY the older partitions are flushed out of memory to disk, but still always retains the recent/new partition in memory.

* If the data is displaced from the older partitions and later if some BW processes requires the data then these older partitions are reloaded back to memory. This causes considerable lag depending on the volume of data and the infrastructure strength. This is one of the key reasons why non-active concept is not a highly recommended option in a very active data warehousing environment, as pulling the data back into memory from disk has negative implications in performance.

*The non-active concept does not reduce the in-memory space as the objects still occupy the required space. If there are large numbers of such objects then it can result in a significant blockage of the HANA memory. This is one of the main reasons why we have the Dynamic Tiering solution.

2. Dynamic Tiering

The Dynamic Tiering is ONLY available for SAP BW 7.4 SP8 onwards and HANA 1.0 SPS 9 onwards and currently only applicable to PSAs, W/O DSOs and in the future support for Advanced DSOs.

If we recall the non-active concept, it only works when there is a memory shortage which means that the entire PSA & w/o DSO will always be in-memory unless there is a shortage during which ONLY the older partitions of these objects are flushed out of memory to disk. This means that the recent/new partition will always be in memory and thus will occupy some space. Also whenever the older partitions need to be accessed by any BW operations they are brought back to memory thus occupying more space. So effectively, this concept occupies space in the HANA memory at all times and there is a risk that if this concept is over utilized then it could result in slower performance and impact other processes.

Dynamic Tiering is very different to what the non-active concept offers. In the DT concept, all data of a PSA and w/o optimized DSO is 100% on disk; which means that the entire image of the object is in the PRIMARY disk. There is no concept of high priority objects and displacement mechanism. This is effectively keeping the entire data of these objects in a separate low cost area but at the same time offering an integrated mechanism to access them whenever required with optimal performance.

The tables in the DT concept are called extended tables (ET) and they sit in a separate warm store “host” on the same storage system as shown in the below diagram. Logically the ET tables are located in the SAP HANA database catalog and can be used as if they were persistent SAP HANA tables. These tables are physically located in disk-based data storage however, which has been integrated into the SAP HANA system. The user sees the entire system as one single database and the persistence of data written to the extended table is hard-disk-based and not main-memory-based. Any data written to an extended table is written directly to the disk-based data storage.

DT offers a single consolidated way of managing the less frequent and less critical data in a very low cost manner and still giving the same level of performance as the hot store. This is possible because the DT uses the main memory for caching and processing thus offering in-memory performance benefits and also the data in the warm memory is accessed using algorithms, which are optimized for disk-based storage; thus allowing the data to be stored in disk. All the data load processes and queries are processed within the warm store and it is transparent for all operations and hence no change for BW processes are required.

Unlike the concept of Non-active, the main memory in SAP HANA is not required for data persistence (in extended tables). The concept of Dynamic Tiering can optimize the main memory resource management even further than the concept of Non-active data by completely moving the staging area data from the hot store to a separate low cost warm storage. This has a positive effect on hardware sizing, especially when dealing with a large quantity of warm data in the PSAs and write-optimized Data Store objects.

Access

MEDIUM FREQUENT DATA

Response Time

Medium Fast. Slightly lower performance than HOT store

Use case

To efficiently manage the low value and low frequent data in the HANA in-memory space and overall offer significantly lower HANA memory footprint

Likely candidates

PSAs, W/O DSOs and Advanced DSOs only.

*Currently there are certain limitations of using Dynamic Tiering in a true Data Centre operation because of the limited scope of Disaster Recovery and limited automation for High Availability, but this is intended to be made available with HANA SP10.

Summary


When you look at the 2 warm options; Non-active concept and Dynamic Tiering concept, the non-active concept has overheads in terms of HANA memory sizing and could result in performance drawbacks if over utilized; whereas the Dynamic Tiering concept mostly replaces the non-active concept by allocating a dedicated disk based storage to endlessly manage the big volumes at a very low cost plan but still delivering optimal performance as in-memory.

As with Dynamic Tiering, it is an area that has the current data and demands frequent access and does all normal HANA operations (READ, WRITE, CREATE, UPDATE, DELETE etc). The DT concept works on differentiating between the less critical layers and the most critical layers of the EDW; effectively giving a dedicated storage for the less critical layers but still managing it as one integral part of the solution.

As for the COLD storage, it is quite clear that it is an area which demands very sporadic READ only access and is ideally an on-line archiving solution that retains and maintains historic information at a very low cost plan. The NLS concept works on differentiating the new data and the old data; effectively moving the old data to a low cost COLD storage solution but still maintaining the tighter integration with the the primary database and is always on-line for reporting.

So where are the savings? Let’s quickly look at an example below;

Let’s assume customer ABC need a 1TB BW on HANA system to migrate their current BW on DB system. If ABC retains all that data in HOT then they will need to license 1 TB of HOT store licenses and 1 TB of HANA hardware. As the volumes and requirements grow there will be a further need to invest in additional HOT licenses and additional HOT memory hardware.

SAP BW on HANA Solution = SAP BW on HANA

Instead if we apply the WARM/COLD concepts and enforce a proper data management policy, then we can split the data according to usage/value/frequency and  maintain them in a low cost storage solution. If we assume a 20:40 split for WARM/COLD, then the requirement for HOT store reduces to merely 40%. So as volumes and requirements grows, the low value/low usage/low frequency data can be pushed directly to the low cost storage systems without even impacting the HOT storage; thus avoiding the need to invest in any further HOT storage licenses or hardware.

SAP BW on HANA solution = SAP HANA (HOT) + Dynamic Tiering (WARM) + SAP NLS IQ (COLD).

So effectively SAP is offering a fair proposition with different components that complements each other and fits well into the EDW architecture of SAP BW running on SAP HANA; thus providing an efficient way of managing different data temperatures depending on their usage, value and frequency.

17 Comments