All databases are in-memory now.....aren't they?

werner_daehn · ‎11-28-2014

Over the last few years the reputation of Hana constantly grew and other people found various arguments why Hana is nothing better. Me being an Oracle expert for more than 20 years was among them, I have to admit. Looking back it was rather a lack of understanding on my side and being trapped in marketing statements of SAP. You think that is a lame excuse? You are probably right.

So let me take you on my journey with Hana and share some internals you have never read elsewhere before.

The first argument came in 2010, mostly from Oracle, and back then was - understandably - "Of course keeping everything in your head is much faster, but that is simply not doable." If I may translate that, the statement was: memory is too limited and too expensive. True enough, even today. What is the price of a hard disk with 1TB and, in comparison, how much does a server with 1TB of memory cost? A completely reasonable argument.

Actually, I just digged up a youtube video and it is fun to watch, even today.

SAP was arguing at the time that you compress the data and hence you do not need as much memory. We all know how compression works and the costs involved, I found that not very convincing.

What struck me even more however was the fact that traditional databases do cache the data in RAM as well, so they are in-memory so to speak, except that only the frequently accessed data is cached, the archived data does not eat into your memory budget.

What I hated about Hana the most was the row versus column storage. Marketing touted that thanks to the columnar storage you can aggregate data much faster, and when confronted with the question of reading an entire row with all columns, the response was "we have row storage as well". Excellent answer. Now I would need both, a row and columnar storage for each table? You cannot be serious.

With this kind of mindset I started to look at Hana, did typical performance tests and quickly found out, there is something severely wrong with my assessments of Hana. Thankfully the Hana developers took the time to engage with me and provided me with internals that explained what I missed before.

Let me try to show.....

The three/four technologies

According to marketing the benefit of Hana are the top three technologies shown below. I am sure you had been bombed with the same arguments, Hana is In-Memory and therefore it is fast. And Columnar. It does Compression!

I can fully understand now, why people, including myself, were skeptical. All of these technologies are nothing new as Larry Ellison stated in above video mentioning Oracle's TimesTen product as an example.

The secret to the Hana performance is not the three technologies as such, in my opinion, it is the intelligent combination of these three plus the insert-only approach. The reason I am saying that is when looking at each technology individually, all have advantages but sever disadvantages as well. Memory is limited, Compression is CPU intensive. Columnar Storage puts column values closely together whereas row storage provides each row's data as one block. Insert-only requires to drop outdated versions from time to time.

In-Memory
Compression
Columnar Storage
Insert-Only

In-Memory

The basic idea is that memory is much faster than disk. Actually it is times faster. A 2014 CPU has a memory bandwidth of 10GByte/sec and higher, a single disk around 150MByte/sec - difference by factor 70. If your program is using the same memory frequently, it is cached inside the CPU's L1 or L2 cache, speeding up the memory bandwidth by another factor of 10. On the contrary the disk speed of 150MB/sec is for sequential read access only, random access is times worse for a disk system whereas has no negative impact on the RAM.
The downside of memory is the costs of the memory chip itself (7USD/GByte for RAM compared to 0.05USD/GByte for disks as of 2014) and the hardware platform you need, in order to cope with more memory, is getting increasingly more expensive also.
On the other hand, if I need 1TB of RAM that would be 7000USD. While this is much money compared to a single 100USD disk, it is not much in terms of absolute numbers.
But you can turn around my argument and simply say, if you have a 1TB big database one disk, use a server with 1TB of memory so all data can be cached.

So the argument "in-memory!" cannot be the entire truth.

Compression

The idea of compression is simple, a single CPU is much faster than the memory bus and the disk, not to mention that multiple CPUs share the same bus, hence compressing data in order to reduce the amount of data is beneficial as long the overhead of that is not too huge. Therefore every major database supports compression. It is not very popular though as compressing a database block and decompressing it takes its toll. The most obvious cost overhead is when data is updated inside a database block. You have to uncompress the database block, make the change and then compress it again. Hopefully it fits into the same database block still otherwise you need a second one.

So the argument "Compression!" cannot be the entire truth.

Columnar storage

For a simple select sum(revenue) the columnar storage is just perfect. You have to read one column only, hence just a fraction of the whole table data is needed. Imagine you have all the data of one column in one file, this will be much faster as with traditional row orientated tables, where all the table data is in one file (or database object to be more precise) and you have to read the entire table in order to figure out each row's columns value.
In case you want to see all columns of a single row like it is typical for OLTP queries, the row storage is much better suited.

So the argument "Columnar Storage!" cannot be the entire truth.

Insert only

A real database should have read consistency in the sense of when I execute my select statement I get all committed data of the table but neither will I see data that has been committed after nor will my long running query fail just because the old value was overwritten by the database writer.
The only major database I know supporting that since the beginning is Oracle (SQLServer has a option to turn that on/off), but you have to pay a price for that consistency. Whenever the database writer does overwrite a block with new data, the old version has to be kept somewhere, in the rollback segments of the database in case of Oracle. So a simple update or insert into an existing block requires two operations, the actual change plus saving the old version.
With insert only, the idea is a different one, there in each table the old data is never overwritten but only appended. If you execute an update against an existing row, a new row is written at the end of the table with same primary key value plus a transaction number. When a select statement is executed, it will see multiple versions and use the one with the highest transaction number less or equal to the currently active global transaction number in the database.
There is a tradeoff hence, the tables grow fast with this approach, especially when you update just a single byte, but on the other hand you are faster.

So the argument "Insert only!" cannot be the entire truth.

Combining the technologies

Okay, this was my starting point. All technologies are good ideas, other companies tried these as well and have built proper niche products, mostly around the analytic use cases. One database is a pure in-memory database that needs a regular database to persist the data, others you can insert data but not update or delete. Many support compression but usually it is turned off by customers.

The claim of Hana is to be a database that supports analytic and transactional use cases and is better than other databases in all areas. That should be easy to be put into perspective, I thought.

However, the one thing I did overlook at that time was how these technologies benefit from each other. So let us go through a couple of mutual benefits to get the idea.

Combination of Compression with Columnar Storage +

Compressing data you can do best whenever there is a repetitive pattern. Let us have a look at a real example, the material master table.

MANDT	MATNR	ERSDA	VPSTA	LVORM	MTART	MBRSH	MATKL
800	000000000000000023	23.01.2004	K	false	ROH	1
800	000000000000000038	04.09.1995	KDEVG	true	HALB	M	00107
800	000000000000000043	23.01.2004	KBV	false	HAWA	1
800	000000000000000058	05.01.1996	KLBX	false	HIBE	M
800	000000000000000059	05.01.1996	KLBX	false	HIBE	M
800	000000000000000068	12.01.1996	KEDPLQXZ	false	FHMI	A	013
800	000000000000000078	10.06.1996	KVX	true	DIEN	M

What can you compress better, the entire file or a single column?
The answer is obvious but nevertheless I exported these columns from the MARA table, all 20'000 rows of my system, into a CSV file (1'033KB big) and did zip the one file with all data plus nine files with one column each.

Obviously the primary key cannot be compressed much, it is half of the data in fact but all other columns are, the MAND file is 303 byte large, the ERSDA file with all the many create dates is 12'803 bytes big.

But this is not a fair comparison as the zip algorithm favors larger datasets as it can look for patterns easier and it is a fairly aggressive algorithm. In databases your compression is lower to require less CPU cycles and more important, each file is split into database blocks. Meaning that if you have a fairly wide table, one database block might include one row of data only that is compressed - hence almost no compression possible at all.
With columnar storage we have no issue with that side effect.

So as you see, the technology of using columnar storage has a huge positive side effect on the degree compression is possible.

Combination of Compression with In-Memory +

That's an easy one. The more compression we do the less memory we need. Before we said that the cost ratio between disk and RAM is about 700 times cheaper. Thanks to the compression, usually is a factor of 10, the disk is just 70 times cheaper. Or more important, for your 10 TB database you do not need a server with 10TB of RAM which would be very expensive.

Combination of Compression with Insert-only +

Compression has one important downside as well however, what if a row is updated or deleted even? The compression spans multiple rows, so when you change a value you have to uncompress the entire thing, change the value and compress it again. With traditional databases you do exactly that. Okay, you do not uncompress the entire table, the table data is split into pages (database blocks) and hence only the impacted page has to be recompressed but still you can virtually watch how much these indexes slow down updates/deletes in traditional databases.
So what does Hana do? Quite simple, it does not update and delete the existing data. It appends the change as a new version with a transaction ID and when you query the table, you will read the oldest version of each row, the oldest version that matches the query execution start time. So suddenly no recompression is needed anymore, data is appended uncompressed to the end of the table and once the uncompressed area exceeds a limit, it gets compressed and new data is inserted into a new page of the table.
The other advantage of that approach is, if a single row is updated multiple times, which row will that be? A booking made 10 years ago? Very unlikely. It will be a recent one, one that is still in the uncompressed area likely.

Combination of Columnar Storage with Insert-only +

As the data is inserted at the end only, finding the data of a row in the various columns is very simple. According to the current transaction id, the row to read was the version at the table position 1'234'567, hence the task is to find for each column the value at that position.

Just imagine an Excel sheet. What is faster: Reading the cells (A,10) + (B,10) + (C,10), in other words the row 10 with the three columns A, B and C?

Or reading the the cells (J,1) + (J,2) + (J,3), in other words the column J with the values in the three rows 1,2 and 3?

It does not make a difference. None at all. The entire argument of reading row-wise is better is actually based on the assumption that it is faster to read horizontally than vertically. Which is true, if the data is stored on disk in a horizontal fashion. Then the data of one row is closely together and hence read in one rush from the disk cylinder. But on a memory system it does not matter at all.

Putting all four together +++

What was the issue with compression?

Compression works best on similar data -> one column often has similar values -> solved
Recompression in case something does change -> we do no change data but insert only -> solved
Compression is CPU expensive -> not a pkzip like compression is used but dictionary and pattern compression -> faster than reading the plain data

What was the issue with memory?

More expensive that disk -> thanks to the compression that factor is dampened -> Reality had proven Hana can run even large enterprise ERP systems

What was the issue with Columnar Storage?

You need to locate the row value for each column individually -> But actually, for memory it does not matter if you read two words from nearby or far away memory pointers. With compression this might even be faster!
Changing values requires to change the entire column string -> True, hence Hana does not change values, it appends data only.

What is the issue with Insert-only?

More and more old data is present and needs to be removed or memory consumption grows fast -> The most recent data is not added into the compressed storage, it is kept in the delta storage. All changes within that delta storage are handled to allow updates.
Above problem is faced only if changes are made on rows in the compressed storage area -> less likely but possible.

Comparing Hana with other databases

Meanwhile other database vendors have been faced with the realities as well and we have seen various announcements to jump on the in-memory bandwagon as well. Given your knowledge about the Hana internals now, you should be able to quantify the advantage for yourself.

Oracle: With Oracle 12c there is an in-memory option available. This option allows you to store the data in addition to the traditional disk based way in a in-memory area as well. Personally I have mixed feelings about that. On the one hand it is very convenient for existing Oracle users. You execute a couple of statements and suddenly you are times faster with your queries.

But this assumes that the traditional disk based storage does have advantages and I tried to show above it does not. Hence it is a workaround only, kind of what I criticized when the response was "Hana has column and row storage". And it leaves room for questions like

What are the costs to store the data twice when inserting/updating data?
To what degree is the database slowed down if part of the memory used for database block caching is now reserved for in-memory storage?
No question this will be faster on the query side, but the OLTP load worries me.

Do me that sounds like the dinosaurs way, trying to sound flexible when all you do is actually doubling the costs. Obviously I lack the hands on experience but sorry Oracle, I was a big fan of yours but that does not sound like a master minds plan to me.

Nice reading I found: Rittman Mead Consulting: Taking a Look at the Oracle Database 12c In-Memory Option

Do you concur with my opinion?