Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
former_member182259
Contributor

Latch Free B-trees (ASE 16sp02 MemScale option feature)

First, a bit of history.  Latches are essentially a form of concurrency control that is a companion to locks in a lock based concurrency control implementation.  However, where locks are transactional - e.g. if you grab a lock, you hold it until the end of your transaction - latches are just for the duration necessary to accomplish a discrete step.  While latches are used elsewhere in ASE, they are most commonly associated with indexes on DOL tables.

There is a bit of history about that as well.  Back in the old days - before 11.9.2 - ASE experienced a lot of contention and deadlocks between applications.  Engineering did a bit of a study and noted that the leading cause of the contention/deadlocks wasn't the data pages themselves, but rather the indices - and specifically the index leaf nodes.   The reason why was that if two users were attempting to insert different rows into the same table - even though scattered - often an index key on a datetime field or sequential key number column would incur blocking at the index leaf page level.  As a result, in 11.9.2 along with datarow and datapage locking, engineering eliminated locking on index pages and replaced them with non-transactional latches.  In early documentation, it was mentioned that this would reduce contention as a process would only grab a latch on an index page long enough to make the necessary modification and then release it immediately without waiting for a commit.

There are two types of latches - read or shared and write or exclusive.  Similar to a lock, when you simply need to read an index page, you grab a shared latch (SH_LATCH) on it.  When you need to change an index page, you grab an exclusive latch (EX_LATCH) on it.  That way, when reading an index page, the page is consistent throughout the read as no one is changing a value right when you are trying to read it.  Similarly, if you need to change an index key, you can make your change without concern whether someone else wants to change the same key or not.

Overtime, however, as transaction volumes have exploded in the near 2 decades since this was implemented, we are once again seeing index latch contention as scalability bottleneck.  There are two ways to detect this.   The first is in sp_sysmon in the context switching section.  The second is in monSysWaits or monProcessWaits (preferred) where high Wait values for WaitEventID=41 exist.  Latch contention can generally be divided into two types:

  • Write/Write contention - two writers collide
  • Read/Write contention - a writer is blocked by a reader or vice versa

The big question is how do you find the source of the contention.  There are several likely sources of latch contention - the first and least likely is on allocation pages.  If you see a lot of latch contention with APL tables, then it could be this case in which you might need to look into dbcc tune(greedyalloc).  The underlying cause is a lot of concurrent space allocations for different tables - which may be controllable by changing the number of preallocated extents.  However, this is the least likely source of latch contention.

The second is on datarows locked system tables - specifically the indexes.  While a bit more likely than the former, the most common cause of this is highly concurrent object creation/deletion or similar activities which due to indexes on columns such as "id" result in contention.  The most likely case might be highly concurrent create #table commands in tempdb.  The only real solution for this is likely to use multiple tempdbs.  As with the first, the likelihood of this being the cause is fairly small.

The third and most common cause of contention is on indexes for DOL locked tables.  Unfortunately unlike locks, latches are not tracked in monLocks nor syslocks.  As a result, there is no easy way to find the exact points of contention.  However, there is a bit of a trick that can likely find the candidates quite accurately.

First, we have to understand indexing 101.  Indexes are always maintained in sort order.  Consequently, indexes on columns with monotonically increasing (or decreasing) values will have all of the index insertions at the last page.  Think trade_date, order_number and similar columns.  The second indexing basic is that we only attempt to store an index key one time.  For nonunique indexes, duplicate key values being stored once with a list of RIDs.  This fact is a bit hard to find, but is located in section 4.6.1 Factors that can Affect Storage Size of the ASE Performance and Tuning Series: Physical Database Tuning manual in a bullet that states:

  • Duplicate keys are stored only once, followed by a list of row IDs for the key.

Now this is interesting - if a new row is added with an existing key, that means the RID list needs to expand.  If ASE used row level latches for index access, this would result in index row forwarding and real performance nightmare.  However, with page latching, a process can expand the RID list by simply rewriting the rest of the page by shoving the subsequent index rows down.

So there we have our main latch contention causes:

a) concurrent updates/appends to monotonically increasing index key values

b) concurrent updates/appends to duplicate index key values

c) any attempt to read an index page during any write operation on same page

d) any attempt to write to an index page during any read operation on same page

Now then, lets outline the process to finding candidates:

1) Verify that index latch contention is a fairly big problem (top 5 in WaitEvents or context switching)

2) Find the most volatile indexes that have high logical reads

3) Obtain the index keys for those candidate indexes

4) Validate that either

  a) a monotonic sequence is likely, or ...

  b) highly duplicative keys are likely, or...

  c) there are a lot of range/covered scans

Once verified, you have a choice of fixing the index, dropping the index or converting to a latch-free b-tree index.

Lets walk through an example a colleague sent me.

The customer complained of a slow system - looking at sp_sysmon, you can see CPU is fairly low:

Engine Busy Utilization        CPU Busy   I/O Busy       Idle

  ------------------------       --------   --------   --------

    Engine 0                       28.7 %      0.5 %     70.8 %

    Engine 1                       23.9 %      0.8 %     75.3 %

    Engine 2                       27.9 %      0.8 %     71.3 %

    Engine 3                       26.5 %      0.5 %     73.0 %

    Engine 4                       26.7 %      0.8 %     72.5 %

    Engine 5                       25.6 %      2.0 %     72.4 %

    Engine 6                       23.3 %      0.6 %     76.1 %

    Engine 7                       21.2 %      0.8 %     78.0 %

    Engine 8                       27.1 %      0.8 %     72.1 %

    Engine 9                       22.7 %      0.6 %     76.7 %

    Engine 10                      25.5 %      0.9 %     73.7 %

    Engine 11                      28.3 %      0.8 %     70.9 %

    Engine 12                      30.2 %      0.5 %     69.3 %

    Engine 13                      21.7 %      0.5 %     77.7 %

    Engine 14                      24.9 %      1.0 %     74.1 %

    Engine 15                      26.1 %      1.5 %     72.4 %

    Engine 16                      30.4 %      0.6 %     69.0 %

    Engine 17                      25.0 %      0.5 %     74.5 %

    Engine 18                      23.4 %      0.7 %     75.9 %

  ------------------------       --------   --------   --------

  Summary           Total         488.9 %     15.3 %   1395.8 %

                  Average          25.7 %      0.8 %     73.5

Now this is not that abnormal for a system experiencing performance problems.  However, most folks focus on high cpu - but in reality, cpu is an indicator whether high OR low.  High CPU can be driven by one of 9 things (LIO, sorting, compilation, spinlocks, etc.).  Low cpu is generally caused by slow IO or blocking/contention.  For us with sp_sysmon this is easy - we just check task context switch section:

Task Context Switches Due To:

    Voluntary Yields                163.1           0.1       97866       3.6 %

    Cache Search Misses              46.1           0.0       27644       1.0 %

    Exceeding I/O batch size          0.1           0.0          76       0.0 %

    System Disk Writes               20.6           0.0       12340       0.5 %

    Logical Lock Contention           0.8           0.0         509       0.0 %

    Address Lock Contention          11.2           0.0        6730       0.2 %

    Latch Contention               1151.9           0.4      691132      25.4 %

    Log Semaphore Contention         84.5           0.0       50674       1.9 %

    PLC Lock Contention               0.7           0.0         391       0.0 %

    Group Commit Sleeps               3.2           0.0        1909       0.1 %

    Last Log Page Writes             63.1           0.0       37854       1.4 %

    Modify Conflicts                 19.6           0.0       11759       0.4 %

    I/O Device Contention             0.0           0.0           0       0.0 %

    Network Packet Received         631.8           0.2      379107      13.9 %

    Network Packet Sent             991.9           0.3      595128      21.9 %

    Network services               1577.1           0.5      946244      34.8 %

    Other Causes                   -230.4          -0.1     -138224      -5.1 %

Now this is a poster child.  Note that when looking at task context switches or wait events, you can ignore network packet recieved (WaitEventID=250) as this is common.  If monSysWaits/sp_sysmon, you can also ignore system related info such as checkpoint idle time, etc. (or Network services above).  Then if Latch contention is in your top 5, you likely have a significant problem which needs some attention if you want to improve performance.  The trick then is to run a query similar to:

select top 20 DBName,

  object_name(ObjectID,DBID) as TableName,

  ObjectName as IndexName,

  IndexID,

  LogicalReads,

  RowsInserted+RowsDeleted as IndexMods,

  UsedCount

  from master..monOpenObjectActivity

  where IndexID>1 and IndexID<250

  order by 6 desc

Assume we get back a list like:

TableName      IndexName                      IndexID     IndexMods

-------------- ------------------------------ ----------- ------------

MessageProcess MessageProcess_Idx1                      3       198828

NotficationTbl NotficationTbl_SentStatus_idx            2        96794

UserSession    UserSession_CreateTime_idx               3        78623

NotficationTbl NotficationTbl_CreationTS_idx            4        32259

NotficationTbl NotficationTbl_ID_idx                    3        32259

The above is a shortened example.  One trick (as we will see why later) is to use the LogicalReads to help score which indexes to focus on.  However, for simplicity's sake, we will just focus on the number of modifications.  Some may question why we only used RowsInserted and RowsDeleted and not RowsUpdated....wellllll....that's because indexes never have RowsUpdated>0.  The simple reason is that indexes are sorted and therefore each update is treated like a deferred update - a delete/insert pair - even if the same row (remember, we may have to rewrite the index page when expanding the RID list).

Since the first culprit is the biggest, we simply do an sp_helpindex on the MessageProcess table to get:

index_name          index_keys                                index_description

------------------- ----------------------------------------- -----------------

MessageProcess_PK    SourceID, MessageID, EventNum, HandlerID clustered, unique

MessageProcess_Idx1  CreateTime                               nonclustered

This one is almost a gimmee.  "CreateTime"....hmmmmmm....I would be willing to bet that it is very monotonically increasing.  Similarly, if we look at UserSession table for the third index above we see a similar situation:

index_name                 index_keys         index_description

-------------------------- ------------------ -----------------

UserSession_PK              UserID, SessionID clustered, unique

UserSession_CreateTime_idx  CreateTime        nonclustered

That is two of the top 5 that *may* be candidates for latch-free b-tree.  Now let's take a look at the NotficationTbl table's indexes:

index_name                    index_keys         index_description

----------------------------- ------------------ -----------------

NotficationTbl_SentStatus_idx  SentStatus        nonclustered

NotficationTbl_ID_idx          ID                nonclustered

NotficationTbl_CreationTS_idx  CreationTimestamp nonclustered

The last two are similar to the first two - likely monotonic sequences.   The first one is a bit fun - SentStatus.  This likely is very low cardinality - e.g. Waiting, InProgress, Sent, Archive....or similar.   An index is somewhat desirable as we likely have a polling process that looks for all the 'Waiting' notifications and grabs the next one.  Hopefully using readpast locking.   Ignoring that, you can probably see the problem.  If we have 10 users concurrently adding messages to the queue, those 10 are all updating the same index row - Waiting...just appending to the RID list.  Similarly, if we have 10 users sending messages, all 10 update (first) the InProgress index row RID list and then the 'Sent' index row RID list.   Aiyeeeyiyiyi.   This is where partitioning the queue into high, medium, low or east, central, west or A-H,I-S,T-Z or some scheme would help.

The big question is whether latch-free b-tree would help here or not.

The answer is in how Latch-Free B-Tree (LFB) is implemented.  In ASE 16sp02, the LFB is implemented using a per database mapping table for all index pages that are dirty/hashed into the mapping table.  This mapping table contains the address of the most recent modification to the index page from which a reader would need to fold into the original page to derive a page image that could be read.  After 16 modifications to the same index page, a delta merge process merges the changes back into the original page.  Because of the versioning, the SH_LATCH for reads is eliminated - the mapping table contains the latest completed change to the page.  Once a writer makes a modification, it does so by copying the latest version of that row, creating a new version and then using the Compare & Swap (CAS) atomic instruction to overwrite the address in the mapping table to the new address.

There is a bit of a problem though.  While SH_LATCH is eliminated, EX_LATCH is not.  This has to do as much with the Write Ahead Logging and the PLC queue buffer/unpin problem as anything else.  However, it also is reasonable to expect that absolutely true concurrency would be impossible.  For example, with a sequential key as a unique index, true concurrency could result in two tasks updating different values (e.g. 3 & 4) the same value (e.g. 5) which would violate index uniqueness.  Additionally, with two inserts of two different rows but both with the same status, so long as we are using the RID list, we have an issue. Remember, while MVCC says writers don't block readers and readers don't block writers...writers DO block writers.

Sooooo....remember our sources of latch contention:

a) concurrent updates/appends to monotonically increasing index key values

b) concurrent updates/appends to duplicate index key values

c) any attempt to read an index page during any write operation on same page

d) any attempt to write to an index page during any read operation on same page

The only ones we can do anything about is the latter two (c) & (d).  This is more of a benefit than you might think at first.  Keep in mind that the implication is a volatile index that is highly read will benefit - and typically, this is often one of the most useful indexes such as Primary Key or date based index as an typical date range scan could impede writes.  Similarly all those polls for 'Waiting' statuses could impede write operations.  If you think about it, the entire reason for the indexes in the first place was for reads - and frequent enough to drive the need for an index....so it is just as likely to suppose those reads are causing a majority of the latch contention.  In fact, in our testing with indexes with high latch contention, we have been able to improve performance by ~2x by using latch free b-trees.   Of course for indexes being used for range scans or covered scans this could be even better.   Range scans or covered scans can be determined by the ratio of LogicalReads to UsedCount .   If <10 then it is fairly much a point query index and not likely much benefit.   However if LogicalReads out weigh the UsedCount by 100x and it is one of the more volatile indexes - the greater the effect of turning on latch-free b-tree.

2 Comments