MaxDB - On locking mechsanisms and how we learn so...

lbreddemann · ‎10-20-2009

As I promised here's the second part of my short excurse into the shoals of lock management with MaxDB databases. (The first part can be found MaxDB - On locking mechanisms and how we get to know our MaxDB a bit better each day... Part I)

2) When deadlocks go unnoticed...

Ok, big buzz word in the title - I'm sure that just the word 'deadlock' will get me the audience this time 😉

Before we start with the actual point, let's get clear about what is meant by 'deadlock'.
Among the huge variety of possible locking and hanging situations parallel working systems can get into, deadlocks are very specific.
The point of deadlocks is not that the system is hanging for a long time, but that it is impossible for the processes involved in the deadlock to resolve it by themselves.

Since locking situations can be thought as (mathematical) graphs. A deadlock can be defined as a closed circular graph with the minimum number of vertices.
The simplest example would look like this:

Process A  [PA]                Process B  [PB]
 Resource A [RA]                Resource B [RB]
 
    ([PA],[RA]) <--- PB ---
                lock request
                 --- PA ---> ([PB],[RB])

In this case PA and PB need to wait for each other for the release of the requested resource. But since they both wait, no process can actually release a lock - this is a deadlock.

Of course deadlocks can be way more complex, including many resources, more processes and sometimes even multiple application layers (these are really nasty, since usually there is no coherent view to these cross-layer locks).

One advantage of this rather abstract view to deadlocks is that this makes it easier to recognize them.
That is what's behind the deadlock detection feature of current DBMS.

Whenever a process needs to wait for a resource for a long time (say 1 second or so), the DBMS looks out for such a deadlock graph and eventually 'resolves' the situation by telling one of the waiting processes that it won't get the lock.

The general idea behind the feature is of course not to prevent deadlocks.
Deadlocks are usually design errors, bugs of the application program. This cannot be fixed automatically.

However, it is important for heavy duty databases to keep running as long as possible.

To make this possible, the deadlock detection and resolution helps a great deal.
Once a deadlock is removed, the whole system can continue it's work, while only one transaction gets an error.

So far the story is rather nice, isn't it?

The DBMS checks for deadlocks and makes sure that the system will stay responsive even if the application designers made a mistake.

Unfortunately nothing is perfect - and so isn't the deadlock detection in MaxDB.
As you may know (or learn now) MaxDB knows different kinds of SQL locks:

Table locks
Row locks
Dictionary/Catalog locks

As long as the deadlock is just between table/row-locks everthing works just as expected:

#### Session 1 (Isolation level 1, Autocommit off)

select * from locka
COLA  COLB
1     X
2     X

select * from lockb
COLA  COLB
1     Y
2     Y

update lockb set colb='YX' where cola=1

#### Session 2 (Isolation level 1, Autocommit off)

update locka set colb='XY' where cola=1

#### Monitoring session
select session, tablename, lockmode, lockstate, rowidhex from locks

SESSION  TABLENAME  LOCKMODE       LOCKSTATE  ROWIDHEX
8459     LOCKB      row_exclusive  write      00C1100000000...
8460     LOCKA      row_exclusive  write      00C1100000000...

Nothing special up to here - let's create a deadlock:

#### Session 1
 update locka set cola='XY' where cola=1

#### Session 2
update lockb set colb='YX' where cola=1

Auto Commit: Off, SQL Mode: Internal, Isolation Level: Committed
 General error;600 POS(1) Work rolled back
update lockb set colb='YX' where cola=1

*** corrected the update statements 20.10.09 22:02 ***

As we see the cross wise row lock request (for the update an exclusive lock is required) is recognized and one session is rolled back.

Now let's do this again, but let's use shared (catalog) locks as well...

#### Session 1
update lockb set colb='YX' where cola=1

#### Session 2
 update locka set colb='XY' where cola=1

#### Session 1
alter table locka add (colc varchar(10))

--> hangs !

#### Monitoring session
select session, tablename, lockmode, lockstate, rowidhex from locks

SESSION  TABLENAME      LOCKMODE       LOCKSTATE  ROWIDHEX
8459     LOCKA          row_exclusive  write      00C110000000000...
8460     SYS%CAT2       row_share      ?          FFFF00000000000...
8460     SYSDDLHISTORY  row_exclusive  write      00FFFE000000000...
8460     LOCKB          row_exclusive  write      00C110000000000...

Wow!

Besides our two already known row_exclusive locks on tables LOCKA and LOCKB we also find one for SYSDDLHISTORY and a row_share lock for SYS%CAT2.

What are those about?
Well, the lock for SYSDDLHISTORY is for a insert statement that is automatically done with MaxDB >= 7.7 whenever a DDL statement is issued.
The SYSDDLHISTORY table will contain all comitted DDL statements by that - neat feature, but has nothing to do with what we want to do here.
The SYS%CAT2 in turn is the mentioned catalog lock.

Now let's create the deadlock:

#### Session 2
alter table lockb add (colc varchar(10))
--> hangs !

#### Monitoring session
select session, tablename, lockmode, lockstate, rowidhex from locks

SESSION  TABLENAME      LOCKMODE       LOCKSTATE  ROWIDHEX
8459     SYS%CAT2       row_share      ?          FFFF00000000000...
8459     SYSDDLHISTORY  row_exclusive  write      00FFFE000000000...
8459     LOCKA          row_exclusive  write      00C110000000000...
8460     SYS%CAT2       row_share      ?          FFFF00000000000...
8460     SYSDDLHISTORY  row_exclusive  write      00FFFE000000000...
8460     LOCKB          row_exclusive  write      00C110000000000...

select tablename, h_applprocess as holder, h_lockmode,
r_applprocess as requestor, r_reqmode from lock_waits

TABLENAME  HOLDER  H_LOCKMODE     REQUESTOR  R_REQMODE
LOCKB      4132    row_exclusive  1904       sys_exclusive
LOCKA      1904    row_exclusive  4132       sys_exclusive

Now this is in fact a deadlock but MaxDB does not do anything about it.

The reason for that is simple:
The deadlock detection does not include the share locks!

To be precise, for share locks the kernel does not maintain a list of session IDs, but only a single counter.
Based on this counter it's not possible to find out which session is holding/waiting for a specific share lock and in consequence the kernel cannot tell which tasks to roll back.
In this case one usertask needs to be manually cancelled or the lock timeout will deny the first request.

Although this is and ugly limitation of the deadlock detection it's not really that bad in day to day DB usage.
The reason simply is that usually there are only few DDL commands running in parallel - especially when it's not the upgrade weekend.

3) The dead walk - how deleted rows reappear

Ok, one last thing 🙂

It's a simple effect that I found to be surprising while I was playing around with locks during the 'research' phase for this blog.

#### Session 1
select * from locktest

THE_ROW  THE_ROW2
1        ?
10       ?
2        ?
3        ?
4        ?
5        x
6        x
7        x
8        x
9        x

delete from locktest where the_row >='5'
More than one row updated or deleted. Affected Rows:  5

-> SEE: no commit here!

#### Session 2
select * from locktest

THE_ROW  THE_ROW2
1        ?
10       ?
2        ?
3        ?
4        ?

Where is the data?

#### Session 1
rollback

#### Session 2
select * from locktest

THE_ROW  THE_ROW2
1        ?
10       ?
2        ?
3        ?
4        ?
5        x
6        x
7        x
8        x
9        x

There it is!

This is a really nasty feature if you come from other DBMS like Oracle.
MaxDB currently (!) does not support a consistent view concurrency and it does not reconstruct deleted rows.
Since deletions are done in-place during the statement execution (and not at commit time) the deleted rows are really just gone when the second session looks into the table.
There's nothing there to tell the second session to look for old data, the data is just gone.

If your application really relies on a consistent view of the data without data access phenomena like 'dirty reads', 'non-repeatable reads' etc. then you either need to use a higher transaction isolation mode (but loose scalabilty by that) or make your application aware of this.

Looking back

As we've seen locking is not really something that is 'just there'.
It can become pretty important to be able to differ between what locking can do for you and what it wouldn't do.

One important thing I did not mention yet explicitely: I've been just writing about SQL locks. But MaxDB (and the other DBMS as well) rely on multiple different shared resources that need to be protected/serialized as well.

For that task MaxDB uses B*Tree-locks, critical regions, semaphores & mutexes, filesystem locks and the like.

So there's plenty of topics to write about ...

Resources

For more information on this area of MaxDB please check these resources:

MaxDB Internals Course - Locking

SAP Note #1243937 - FAQ: MaxDB SQL-Locks

MaxDB Dokumentation - Locks

Marketing

If you're not already booked for October 27-29 this year and you happen to stay in Vienna and you keep asking yourself what to do ... then get your ticket for SAP TechED 2009 and make sure to attend my MaxDB session !

In addition to the presentation there will be a expert session on the afternoon, where I'll await your questions that I hopefully can answer.
It's session EXP349 MaxDB Q&A Tuesday, 2:30 P.M that you should register for.