Additional Blogs by Members
cancel
Showing results for 
Search instead for 
Did you mean: 
former_member184494
Active Contributor
0 Kudos

Why do we need compression ?

The nature of tables and tabls storing row level data makes them candidates for compression because:

1. The space consumed is lesser
2. CPU advances permit us to compress and uncompress the tables in memory and do the necessary analysis  while reducing disk space
3. I/O time is improved by reducing seek times because fetch time is less because the amount of data to e fetched is less thus reducing transfer times
4. Also a larger portion of the table will fit in memory because of reduced data.

Compression in a row store :

Compression in a row store scenario would mean removing the rows with zeroes and also looking at repeated keys which lend themselves to compression.
Because an attribute is stored as a part of an entire record, combining the same attribute from different records together into one value would require some way to “mix” tuples.

Compression in a column store

Storing data in columns presents a number of opportunities for improved performance from compression algorithms then compared to row-oriented architectures. In a column-oriented database, compression schemes that encode multiple values at once are natural.

Also how else can data be compressed . This goes into compression techniques employed by file compression which involves lossless compression. I do not want to go into details and digress .. but the net outcome is this.

You have a table with N number of lines stored on the database.
Each row is addressed to a single two dimensional point on the disk and this data can be compressed and stored. Why should this be compressed ? this saves space and moreover all the data is not needed all the time. The system tends
to compress the data and uncompress the same when required. This way ...

1. Disk Space is optimized.
2. Data is uncompressed and fetched whenever required.

Things to consider here are :
The uncompression takes a fair amount of CPU.. but then in most cases disk space is inexpensive.

In row stores the number of records still are large to be compressed / uncompressed ..  i.e the compression ratio cannot be optimized further.
However in a column store the data can be compressed further due to the nature of repeating rows.  Also indices on the columns will add up to less that the indices on the row store.

Thus it becomes apparent that column stores naturally lend themselves to compression and thereby the compressed table can be quite small. Also column stores when compressed using some common compression mechanisms do not need to be decompressed to be read and can be read directly.

For example, if a compressed column says the value “42” appears 1000 times consecutively in a particular column for which we are computing a SUM aggregate, the operator can simply take the product of the value and run-length as the SUM, without having to decompress.

 Will discuss about compression techniques...

 Cheers!