Solved: feedback about a document on data storage and how ...

Former Member · ‎10-08-2015

Hi community,

I just came accross this document on the Web:

http://www.softwaregems.com.au/Documents/Sybase%20GEM%20Documents/Sybase%20Data%20Storage%20&%20Frag...

I helped me understanding how data are stored but I didn't get all the points.

Have you ever read it? and do you have any comments about the content?

Based on it, I'd like to 'see by myself' how data are physically stored depending on the objects ( APL / DOL tables ; CI / NCI ; LOB...).

Could you tell me the command I should use if I want to see, for a 'structure' (table / index / lob...) the allocation unit / oam / page-chain?

i found 'dbcc listoam' but I believe there are others.

Thanks in advance.

Simon

former_member182259 · ‎10-11-2015

Unfortunately, that document is full of a lot of inaccuracies as well..... Segments really make sense when using different classes of storage - e.g. if you know for example that you have tiered storage and can reserve tiers without worrying about migration - then segmentes are useful for specifying locations for partitions or indexes - especially text indexes. However, unless you can control the underlying storage locations, using segments can be an exercise in frustration - especially if the storage system automatically moves data between storage tiers.

One of many fallacies in the paper was that storage fragmentation (and by inference the need for segments) is driven by the fact that table and index storage is intermingled. This is actually false. When any object is defined for storage, we preallocate a set number of extents for precisely that object/index - by default (and minimum I think) this is 2. However, users often find that larger is better as it can reduce hits on the OAM during high txn rates. An APF can be of *any* size - assuming you have a corresponding cache of that size (e.g. 16K), and a large IO will also be the size of any cache pool larger than page size. However, a large IO is best thought of as an extent IO - as we will only read an extent at a time and no larger chunks and all the large IO is done within the boundaries of an extent - meaning a large IO will never span across more than one extent. Given an extent allocation scheme and the number of preallocated extents, this means that fragementation of multiple tables or multiple indexes has no relevance to whether large IO is useful or not. It *DOES* have an impact on IO concurrency and device contention - but that largely depends on OS, volume manger, storage subsystem, etc. This is another area where segments can help as they can distribute writes to different OS devices and thus bypass IO concurrency limits at the device/volume level.

And anything mentioning DOL (DPL or DRL) is highly suspect in that paper. Reading through it, I can only conclude that while the author is familiar with the basics of row forwarding, that is about all. Description of placement indexes (of which a clustered index is precisely that) is more than slightly inaccurate as a DOL table with a clustered index will NEVER operate as a heap unless the clustered index is on monotonic key (such as dates - e.g. txn dates) or on really small tables when the placement rules all result in the same allocation unit or extent due to the small size.

My recommendation would be to start with the ASE P&T Physical Database Tuning doc and then play around with dbcc page() with some tests to see what is happening. However, temper ideals with a cold dose of reality. Yes, city - a required field for address - may be required and therefore tempting according to some to make it char(30) - the reality is that varchar(30) is likely better - even if not null as you can save quite a bit of storage and reduce IOs over 100's of millions of rows. And if a column might have an unknown value - then a "null" column is perfectly acceptable vs. being silly and forcing a not null and putting in spaces just for an unknown value.

Former Member · ‎04-01-2016

Straight out the the internals course materials Bret. Cool! Jean-Pierre

kevin_sherlock · ‎10-10-2015

That doc is the fine work of Derek Asirvadem. His email is in the doc if you dare to ask any questions. Derek's stuff is always a fascinating read. Mixed in with what is some really good info, seems to be a gratuitous, relentless rant against DOL tables and "Placement Indexes". Aside from that, some really good stuff in that doc. I'm curious what "points" did you not quite get? Was it one of these?:

- "During the discussion of logical or physical DataStructures, non-technical terms such as 'table','base table' and 'object-index pair' are too ambiguous to be meaningful: those who use them are committed to your continued confusion."

- "The use of the term "clustered" Index in relation to DOL tables is therefore incorrect, confusing, and fraudulent."

- "Sites that use such tables generally do not use Segments, and thus all DataStructures in the entire database is fragmented across the single default Segment. Florists call this "striped", and wonder why it is slow; engineers call it retarded."

- "Based on the naïve belief that Evangelists Preach the Gospel, while ignoring the fact that Evangelism is a marketing concept, and in substitution of genuine knowledge and technical examination: Myth that the DOL Placement Index (unfortunately addressed via the "clustered" syntax), is the same as the Clustered Index."

or my personal favorite:

- "A man and a woman are meant to be married; together they achieve more than each achieves separately. Implementing APL tables without a Clustered Index, is analogous to a divorced couple. Likewise, there is no fidelity in non-unique Clustered Indices "

Ah yes. That's all pure Derek.

Go ahead, email him your questions. Let us know how that goes for you

Former Member · ‎10-09-2015

Simon,

Thanks for the link - a really good detailed description.

Not sure how keen you are to see the actually underlying representation of data but you could ...

create a database using an underlying file (not a raw device)

create a table in this database

populate the table with a few rows with a string you can search for

shutdown the database

do an hex/octal dump on the database and search for the data you've inserted

It'll be difficult to get much out of this for the indexing location - but it might help.

I've only ever used it to understand how data rows are stored on disk but am sure its possible if you put in enough effort and time.

feedback about a document on data storage and how to track space allocation

Accepted Solutions (1)

Accepted Solutions (1)

Answers (3)

Answers (3)

Re: DRC e-Cockpit Add a New Button

Re: BTP Build Apps- Error while integration of BTP...

SAP MAXDB : Content Server Administration - Reposi...

Re: "Failed to update setup engine executables. Pr...

Re: Timer showing while sending the mails from SAP