Today, when we talk about SAP HANA (High Performance Analytical Appliance ) - In Memory Computing Engine, the main question arises in each client/customer/developer’s mind is about Row storage and column storage. Since last couple of decades row store is very popular technique for data storage. Column storage though came into picture long back; there was no commercially popular database in the market. The main reason was processor speed and size. Now, with fast processing power we have 64-bit processor easily available in market. This blog is a small attempt to throw some light on this storage technique.
So let’s begin the tour...
Row storage is the conventional technique that is used in earlier databases to store the data in the tabular form. In row storage, data is inserted in form of tuple. Each tuple is nothing but a row which is unique identification of each record. There are many advantages of row store but the main or important advantage is easy to add or modify a record. So the systems where write back to database operation is required in that case row store is easier than column store.
- Row Store tables have a primary index
- Row ID maps to primary key
- Secondary indexes can be created
- Recommended when the tables contain less volume of data.
- Used when the application request has to access the entire row.
Though HANA is a hybrid database which uses both row as well as column store techniques, I have given more emphasis on column store as the whole data modeling happens under row store technique.
A column oriented database (also called as C-stores) stores the content column wise rather than by row. Though this technique is totally opposite to the conventional database, it is very useful in data compression, Storing data by column results in better compression than the conventional row store. It also performs well while query processing since queries read only the columns which are required.
Dictionary or bitmap encoding are some of the row store encoding techniques which are also used in column store.
Conventional databases i.e. row store are good for write operations therefore database with row store architecture is also called as write optimized system. This type of architecture is effective especially in OLTP systems.
On the other hand, system which are used for ad hoc querying with huge volume of data are read optimized, i.e. analytical or data warehouse systems where data is mostly used for reporting purpose. These systems represents read optimized in which bulk amount of data load is performed with ad hoc queries. Other examples of read optimized systems are CRM (Customer Relationship Management), Electronic Library catalogues and other ad hoc inquiry systems. In such environments, a column store architecture, in which the values for each single column are stored contiguously, should be more efficient.
- Column store databases stores a dictionary of every distinct value that occurs in the column.
- For each distinct value one needs to maintain a list that tells, for which row a specific value occurs
- When we want to get a row/record from these values we need to go through each and every column and find the appropriate match. This is nothing but index inversing.
Advantages of column store:
- Improves read functionality significantly, also improves write functionality
- Highly compressed data
- No real files, virtual files
- Optimizer and Executer – Handles queries and execution plan
- Delta data for fast write
- Asynchronous delta merge
- Consistent view Manager
- Recommended when the tables contain huge volumes of data.
- Used when lot of aggregations need to be done on the tables.
- Supports parallel processing.