cancel
Showing results for 
Search instead for 
Did you mean: 

Compression taking more space than non-compressed?

Former Member
0 Kudos

Hello,

I did an analysis on our top 100 column store objects in HANA to see if we have a potential to save some memory (This is a SAP BW system).

I found one sample in M_CS_COLUMNS where the compression rate was > 100...meaning the non compressed column was smaller than the compressed column.

Overall the compression rates are looking brilliant but after I checked for all our top 100 sized column store tables in M_CS_COLUMNS I found ~500 columns that show a smaller UNCOMPRESSED_SIZE vs. MEMORY_SIZE_IN_MAIN size...In total we talk about 19GB of memory the uncompressed columns would utilize less than then compressed ones.

UNCOMPRESSED_SIZE < MEMORY_SIZE_IN_MAIN

This is the top column entry which shows even a 3 times bigger compressed size:

HOSTPORTSCHEMA_NAMETABLE_NAMECOLUMN_NAMEPART_IDMEMORY_SIZE_IN_TOTALMEMORY_SIZE_IN_MAINMEMORY_SIZE_IN_DELTAUNCOMPRESSED_SIZECOMPRESSION_RATIO_IN_PERCENTAGECOUNTDISTINCT_COUNTCOMPRESSION_TYPEINDEX_TYPEINDEX_LOADEDIMPLEMENTATION_FLAGSLAST_ACCESS_TIMELOADEDLAST_LOAD_TIME
xxxxxxxxxxxx/BIC/XxxxxxSID01.097.650.8121.096.242.9041.407.908330.789.114331,8382.696.00682.652.260DEFAULTFULLLOADED1705.10.2015 07:05:24.448912TRUE04.10.2015 08:02:54.739835

How can this be explained and shall I change the COMPRESSION_TYPE?

Thank you

Florian

Accepted Solutions (1)

Accepted Solutions (1)

lbreddemann
Active Contributor
0 Kudos

Ok, this is not a mystery at all.

Compression in SAP HANA mostly works by replacing duplicate data entries with pointers or aggregated entries (e.g. the repeating value 10 10 10 10 could be stored as 4 x 10 and so on).

Now the column store stores information in a split up way.

1. it stores all different values once (that's what we call the dictionary)

2. it saves the occurrence of a value in a record by storing a pointer (up to 4 byte integer) to the dictionary entry.

This approach works pretty well for values that take more space than the pointer and that are repeated often. Think of the three character (6 Unicode bytes) long, ever repeating client (MANDT) column.

Here we store the value once and only point to it with actually just 2 bits - since the dictionary only contains 1 value.

Now look at your example:

The number of different values (distinct values) is nearly as many as you've got rows in your table.

That means, the SID here is close to be unique; which gives us little space to compress 'duplicates'.

Then, the data type of the SID is usually an integer, which means, even if we would have lots of duplicates, the saving by using a reference instead of the value might be rather small.

Finally, this column has an inverted index set up with it. This means, that the pointers to the value occurrences are actually stored twice.

One time linking the occurrence to the value and one time linking the value to the occurrences (hence inverted index).

And there we have your threefold space requirement.

There's plenty of information available meanwhile how column store compression works (not the least one to mention Richard's and my book ) so I think this should answer the question well enough for now.

Former Member
0 Kudos

Hi Lars,

I did not expect the column to have any good compression, but what triggered this post was the fact that it showed a more than 3 times bigger size compared to the value in uncompressed_size.

Now the fact that storage space for the index is being added explains this, as well as I meanwhile found that I should have queried M_CS_ALL_COLUMNS rather than M_CS_COLUMNS for such a detailed analysis of the columns.

In M_CS_ALL_COLUMNS the memory utilization is better separated and it becomes self explaining.

The numbers for the same table queried in M_CS_ALL_COLUMNS:

MAIN_MEMORY_SIZE_IN_DATAMAIN_MEMORY_SIZE_IN_DICTMAIN_MEMORY_SIZE_IN_INDEXMAIN_MEMORY_SIZE_IN_MISC
476.503.120651.832.232289.211.5361.968
MEMORY_SIZE_IN_MAIN
TOTAL =1.417.548.856

Btw. Thank you for the SAP HANA Administration book.

I've just re-read the sections for indexing and compression.

lbreddemann
Active Contributor
0 Kudos

What can I say? Thanks for buying the book !

lbreddemann
Active Contributor
0 Kudos

Not sure you saw them before, but you might find

and interesting/amusing/entertaining.

Answers (0)