MaxDB weekend magic - save 50% total storage space...

lbreddemann · ‎10-22-2010

People use assumptions to make decisions.

They do this all the time and in sofar developers are people as well 🙂

To be a bit more specific, also database developers make assumptions,
One of them is for example that when you query data then you ask for data that is actually there.
You want to get some data back.
Therefore, index-structures are optimized to answer this kind of question and not the "i-want-to-check-whether-this-really-does-not-exist" kind of query.
Go and try to optimize a SELECT * FROM TABLE WHERE COLUMN != xyz statement!

Another assumption is the following:
Most tables have primary keys that allow the unique identification of every row in the table.
AND (!) this primary key is rather short compared to the whole row size.

In MaxDB we find these assumptions represented in the way how indexes use primary keys as logical row references.
Given this logical referencing one can observe an interesting effect.

Let's take SAP standard table WBCROSSI (Index for Includes - Where-Used List Workbench).

On a standard installation this table can take up some space:

--------------------------- ----------------------
          Total Size in KB|      Number of Entries
Entire Table                
                    108168|                 455083
Index WBCROSSI~INC          
                    116832|                  59182
Index WBCROSSI~MAS          
                    139136|                  39533

TOTAL              364136    
--------------------------- ----------------------

Now I've made a copy of this table and added yet another index.
Check the sizes:

--------------------------- ----------------------
          Total Size in KB|      Number of Entries
--------------------------- ----------------------
Entire Table                
                     41216|                 455083
Index WBCROSSI_LB~INC       
                     10512|                  59182
Index WBCROSSI_LB~MAS       
                      8872|                  39533
Index PK_INDEX              
                    108464|                 455083
                    
TOTAL              169.064
--------------------------- ----------------------

WOW!
We see a difference of (364136 - 169064 = 195072, thanks calculator.exe!) 195072 KB or 190 MB or nearly 50% savings!!
I added and index an SAVED storage!
And no, I didn't use some unreleased super efficient compression technology here.

The same effect can easily been observed even with MaxDB 7.5 or earlier versions.

And? Curious now?

Like all good magic, the solution is simple (and a bit boring) once you know it.
So stop reading now, if you want to keep the magic 🙂

Ok, looking back at the initial table and index sizes gives a first hint:
In the original table all secondary indexes are actually LARGER than the table itself.
Why is that?
Let's check the table definition to answer this:

------------- ----------------------------------------
Column Name   |Data Type |Code Typ|Len  |Dec  |Keypos
------------- ----------------------------------------
OTYPE         |VARCHAR   |ASCII   |    2|     |    1
NAME          |VARCHAR   |ASCII   |  120|     |    2
INCLUDE       |VARCHAR   |ASCII   |   40|     |    3
MASTER        |VARCHAR   |ASCII   |   40|     |    4
STATE         |VARCHAR   |ASCII   |    1|     |    5
------------- ----------------------------------------

The important part here is the Keypos column.
You may notice, that ALL columns form the primary key of this table!
Although semantically correct and allowed in SQL and the relational model this is a rather seldom situation.

It's so seldom that it even contradicts one of the mentioned assumptions:
"the primary key is rather short compared to the whole row size."

With this (also pretty long 2+120+40+40+1 = 203 bytes) primary key the logical referencing is a game of keeping the same data over and over again.
An index entry e.g. for Index WBCROSSI~INC will look like this:

Index Name WBCROSSI~INC
Used: Yes               Access Permitted: Yes      Consistent: Yes
------------------------------------------------------------------------
Column Name                     |Type  |Sort
------------------------------------------------------------------------
INCLUDE                         |      |ASC
STATE                           |      |ASC
------------------------------------------------------------------------

Index key        Primary key
INCLUDE/STATE -> [OTYPE/NAME/INCLUDE/MASTER/STATE]

There we have it: since all columns are part of the primary key, we always double store the data for the index keys.

This makes it pretty obvious why the secondary indexes are larger than the table.

But what did I change to save the space?

I dropped the primary key!
In MaxDB a table gets a system generated surrogat primary key, if it does not have a defined one.
This generated primary key (hidden column SYSKEY, CHAR(8) BYTE!) is rather small and we don't have to copy the whole row into every index entry.

But we have to make sure that the primary key constraint features are still provided:
ALL of the columns have to be NOT NULLable and any combination of these columns need to be UNIQUE.

Nothing as easy as this!
I defined a NOT NULL constraint for every column and created a new unique index over all columns.
This is, by the way, the way how the ABAP primary key definition is mapped to the database on Oracle systems all the time!

WBCROSSI_LB
Column Name                     |Data Type |Code Typ|Len  |Dec  |Keypos
------------------------------------------------------------------------
OTYPE                           |VARCHAR   |ASCII   |    2|     |    0
NAME                            |VARCHAR   |ASCII   |  120|     |    0
INCLUDE                         |VARCHAR   |ASCII   |   40|     |    0
MASTER                          |VARCHAR   |ASCII   |   40|     |    0
STATE                           |VARCHAR   |ASCII   |    1|     |    0
------------------------------------------------------------------------

Indexes of Table: WBCROSSI_LB
------------------------------------------------------------------------

Index Name PK_INDEX
Column Name                     |Type  |Sort
------------------------------------------------------------------------
OTYPE                           |UNIQUE|ASC
NAME                            |UNIQUE|ASC
INCLUDE                         |UNIQUE|ASC
MASTER                          |UNIQUE|ASC
STATE                           |UNIQUE|ASC
------------------------------------------------------------------------

Index Name WBCROSSI_LB~INC
Column Name                     |Type  |Sort
------------------------------------------------------------------------
INCLUDE                         |      |ASC
STATE                           |      |ASC
------------------------------------------------------------------------

Index Name WBCROSSI_LB~MAS
Column Name                     |Type  |Sort
------------------------------------------------------------------------
MASTER                          |      |ASC
------------------------------------------------------------------------

By replacing the full row primary key (203 byte) with the SYKEY (8 byte) we save enough space in the secondary indexes that even the full table data copy in the new index does not make the size in total much larger.

Before you now go off and look for other tables where this 'compression' could be applied wait a minute.
As nothing comes for free in life, this of course also has it's price.

With the new setup, a primary key lookup now may lead to two seperate B*tree accesses (primary key index + table).
This will be especially true, when the optimizer cannot use the index only optimization (e.g. during joins).

Also the ABAP dictionary check will complain about this and the transportation of this setup will likely lead to problems.

Hope you enjoyed this piece of weekend magic with MaxDB!

Cheers,
Lars

MaxDB weekend magic - save 50% total storage space!

And? Curious now?

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win