cancel
Showing results for 
Search instead for 
Did you mean: 

Sybase IQ and HANA. Tables. Different size on disk

Former Member
0 Kudos

Dear community.

The same table in IQ (15.4 ESD2) and HANA (SP5). 

But size on disk different. Why ?

IQ -  2GB

HANA - 5,1GB

Accepted Solutions (0)

Answers (2)

Answers (2)

Former Member
0 Kudos

Also anoter table (more widely, and 147 mln rows).

More than 7 times. Interesting why ? Where is compression factor? Does we save disk space in HANA ?

In Oracle 11g the same table also around - 120-125GB.

IQ - 16GB

HANA - 118GB

lbreddemann
Active Contributor
0 Kudos

Mikhail Budilov wrote:

Also anoter table (more widely, and 147 mln rows).

More than 7 times. Interesting why ? Where is compression factor? Does we save disk space in HANA ?

In Oracle 11g the same table also around - 120-125GB.

IQ - 16GB

HANA - 118GB

Again: we don't optimize on disk space.

Why would we?

Check the data in memory: HANA :18,5 GB.

And HANA directly works on these compressed data structure. No "unzipping", no copying the uncompressed data somewhere...

cheers,

Lars

Former Member
0 Kudos

Dear Lars.

I am don't understand.

What is faster - read from disk 18G unziped and write to memory or    read 118GB from disk and write to memory ?

reading uncompressed data from disk is slower that reading comressed and unziped of the fly, isn't it ?

lbreddemann
Active Contributor
0 Kudos

Hi Mikhail,

the point is: HANA reads the data in parallel and usually only once.

After the initial load to memory it stays there,

And while it could easily be that the serial loading is faster with IQ this really doesn't matter too much for the execution of queries.

And unzipping data over and over again can be very time consuming depending on the available resources.

For me the question still is: what do you actually want to know?

cheers,

Lars

Former Member
0 Kudos

Lars, i am just want to know what is iе SAP HANA, and how it works.

When you get CPU-power of last E7 40 and more cores , you can unzipping TBs without any problems.

Especially if you load data from disk to memory - by demand (by first access to data).

My basis team told me 1TB Hi-End data cost in  80-100K EUR diapason.

Why all company around the world must spend this additional money for every 256-512RAM HANA...

It's tens or more billions EUR.

And increasing time for backup, and backup space, tapes....

Former Member
0 Kudos

Today the same table in HANA, unbelevable

HANA - 153GB

Still no ideas - why)


lbreddemann
Active Contributor
0 Kudos

Mikhail,

would you mind to share more details about your test case?

What revision of HANA do you use?

What do you do with the data (bulk load, updates, deletes, merges)?

What kind of statements do you run against this data?

Please do also post the output of the following queries:

select * from m_cs_tables where schema_name='<schema_name>' and table_name='<table_name>';

select * from m_cs_columns where schema_name='<schema_name>' and table_name='<table_name>';

Your example is all but the "normal" case for HANA.

Cheers, Lars

Former Member
0 Kudos

Lars,

HANA SP5 (rev 45). HANA Box with 40CPU, on midrange array, and zero activity,

>  What do you do with the data (bulk load, updates, deletes, merges)?

Table TST. Which was bulk loaded from IQ by DS.

After that was created table TST2. and execute

insert into TST2

select *

from TST

> Please do also post the output of the following queries:

In attach.

lbreddemann
Active Contributor
0 Kudos

Mikhail,

HANA optimizes and compresses data for usage in main memory.

The on-disk representation is usually not as well compressed and that is on purpose.

The data structures on disk and what HANA works on in memory are quiet different - so looking at what's on the disk is not a 'fair' comparison.

It's apples vs. oranges if IQ (or other DBMS) pretty much write data to disk the same way they use operate on it.

cheers,

Lars

Former Member
0 Kudos

Hi Lars,

am i right, that you talking about row storage type of data on disk? or it's something else (additional indexes)?

i want to understand, what is consuming so much resources of the disk?

lbreddemann
Active Contributor
0 Kudos

Denis,

whether it's column store data, row store data or anything else (yes, there are more data types that can be persisted in HANA) has nothing to do with what you're seeing here.

You're observing a feature of the column store. (yep - not a bug!)

While the column store always uses a value dictionary that provides good compression capabilities for columns with a high number of repetitive values it also offers additional compression algorithms on top of that.

Some of these compression algorithms are based on the way the data is located within the column main store. To optimize compression factors, the data in a column obviously can be re-ordered (don't confuse this with sort order in SQL!). Of course, if you re-order one column you have to adapt the remaining columns as well - otherwise you can't re-fit the column values to a row correctly.

Depending on the data in the columns (and data changes over time!) it might become beneficial to change the ordering.

All this is done in memory when the data is loaded the first time from the disk.

What is stored on disk is the pure column store data in it's uncompressed form (but still with value dictionary).

On top of that: the data on disk is stored for optimal read-performance (disk to ram).

On top of that: data is stored in containers of variable sizes - they might allocate more disk space then 8K block/page containers.

BUT, all this is really not an issue for HANA systems.

Because HANA system sizing is based on data amount and so is the storage. So you never go to your infrastructure team and ask for more storage.

When performing backups, HANA doesn't copy over the files. It does read just the "payload" of data in those files and stores it to the backup.

If you actually want to compare different data structures over different DBMS then you should at least take the data structure HANA actually works on: the in memory data structure.

Still: any comparison of this kind is a bit futile. Or did you ever compare on Oracle Bitmap index with a MS SQL Server clustered index to see the size difference?

cheers,

Lars

Former Member
0 Kudos

Lars,

i'm trying to figure out ,for what its stored in this form. it's good method to store, if your data is stable(not changing often). how it works with data updates. so if it's changes on OLTP side, how changes will occur in HANA? first in memory and after that on disk or reverse logic - disk and memory after that. last process will be not so fast, if data is stored in column uncompressed form.

that is the core of the my question. i'm not arguing about used space, i'm arguing about benefits(purpose) of this method.

I'm still trying to take in mind that first of all HANA is real-time platform and it must control such processes.

lbreddemann
Active Contributor
0 Kudos

In HANA changed data is not directly written to the main store part of the column store but instead to a write-optimized structure called "delta log".

Technically all data is handled first and foremost directly in-memory and written to disk only when absolutely necessary.

E.g. the REDO log (caution: this is different to the delta log!) is written out at commit time at latest.

And you're right: as long as the data is in the "delta log" and uncompressed, the query performance is not at the max. possible state. To deal with that, HANA employs internal algorithms to decide when actually moving the data to the main index and compressing it there would be worth the effort.

The benefits of putting data in this form are lower memory footprint for data the database can operate on directly, more options to parallelise data access and usage of vector command set extensions on CPU level (work on more data per cycle).

Maybe it's a good start to check out the available documents on delta merge, e.g. http://www.saphana.com/docs/DOC-2002.

Cheers,

Lars

Former Member
0 Kudos

Lars,

thank you for this info. it gave some direction to discuss.