From the Archives: What Exactly Is Thread Deadlock...

JasonHinsperger · ‎05-21-2014

In this post, originally written by Glenn Paulley and posted to sybase.com in May of 2009, Glenn talks about thread deadlock in SQL Anywhere, and how application design can cause the problem. Note that more recent versions (16 and later) of SQL Anywhere dyanamically adjust the mutli-programming level so there is no longer a need to set the -gn server option in most situations. This can also mask the poor design that Glenn talks about below, but it cannot .

Thread deadlock is a specific error (SQLCODE -307, SQLSTATE '40W06') that a SQL Anywhere server will return for a specific request. In this post I want to document why and how thread deadlock can occur, and the mechanisms that can be used to diagnose the problem.

SQL Anywhere's threading architecture

Like other database management systems, SQL Anywhere implements its own threading architecture rather than rely solely on the threading model of the underlying operating system. Because SQL Anywhere supports a wide variety of OS and hardware platforms - Windows, Linux, Windows CE, Sun OS, AIX, HP-UX, and Mac OS/X to name a few - SQL Anywhere will utilize "lightweight" threads (often called fibers) on the operating systems (ie. Windows, Linux) that support them, and regular OS threads on those OS platforms that do not.

Moreover, in SQL Anywhere the server does not dedicate a thread (fiber) to a specific connection. Rather, a fixed-size pool of server threads are assigned dynamically to tasks as they enter the server for execution. Often a task is an SQL statement from an application or stored procedure, but there are many different types of tasks that a thread can service. Once a task is scheduled on a thread (fiber), that thread is assigned to process that task until the task completes or is cancelled.

By default, SQL Anywhere creates 20 threads when the server is started (3 on Windows CE). This default can be changed by using the -gn command line switch. In effect, the number of threads determines the server's multiprogramming level - the maximum number of tasks that can be active at any one time. Server threads are independent of the number of connections made to any database on that server. Hence a given thread (fiber) can first service a task for one database, and subsequently service a task for a connection to a different database.

Thread deadlock - the condition

Threads in the SQL Anywhere server service tasks, which ordinarily are database requests such as PREPARE, DESCRIBE, OPEN, FETCH. Often these tasks can be serviced very quickly; sometimes they take considerably longer, such as OPENing an INSENSITIVE cursor over a large result set. At any one point, the thread servicing that task may be executing a query access plan operator, marshalling result expressions into output buffers, waiting for an I/O operation to complete, or it may be blocked on a shared resource: for example, a schema lock or a row lock.

Given a multiprogramming level of n, thread deadlock is the situation where n-1 threads (fibers) are servicing active tasks but are blocked, and the nth thread (fiber), also servicing an active task, is about to block. The server must prevent all threads (fibers) from being blocked since this would result in a "hung" engine - no threads would be available to perform any work since all are blocked, no new connections could be handled and all new tasks would be queued.

This situation differs from "true" deadlock in the sense that in "true" deadlock two or more threads complete a cycle of dependencies such that none of the threads (fibers) can continue. With thread deadlock, however, it is possible for completely unrelated SQL requests to be blocked, each tying up a server thread (fiber), so that if the nth thread attempts to block the SQL request will receive the -307 error. Recall that the set of threads (fibers) in the server service all SQL requests, even for those connections connected to different databases - so thread deadlock can occur due to the combined workload of each of the databases.

Busy servers that service tens or hundreds of connections may experience thread deadlock in cases where many requests are long-running, either due to the size of the database or due to blocking. In this case, an appropriate remedy is to increase the server's multiprogramming level by restarting the server with a higher value for the -gn command line switch.

All too often, however, application systems can experience thread deadlock because of excessive or unintentional contention due to application design. In these cases, scaling the application to larger and larger datasets or numbers of connections exacerbates the problem. Moreover, increasing the multiprogramming level to higher values rarely provides relief.

How to incur thread deadlock

To illustrate how to (easily) obtain instances of thread deadlock, I'll use a simple multi-client example where each client periodically inserts a row of "sensor data". The "sensor data" will be stored in the following table:



CREATE TABLE sensor_data (

  sensor_id BIGINT NOT NULL DEFAULT AUTOINCREMENT PRIMARY KEY,

  sensor_number INTEGER,

  sensor_data  VARCHAR(50) NULL

)

In addition, each time a client inserts a row of sensor data, the client will update a summary record that contains the total number of inserted records for that sensor. The summary data table is as follows:



CREATE TABLE summary_data (

  sensor_number INTEGER PRIMARY KEY,

  sensor_count  BIGINT

)

The logic for each client connection is embodied in the following stored procedure:



CREATE OR REPLACE PROCEDURE INSERT_SENSOR_DATA()

  BEGIN

      DECLARE sensor_ident INTEGER;

      SET sensor_ident = MOD( 1000.0 * RAND(), 10 );

      INSERT INTO sensor_data VALUES( DEFAULT, sensor_ident, 'This is a test.' );

      IF EXISTS ( SELECT * FROM summary_data WHERE summary_data.sensor_number = sensor_ident ) THEN

        UPDATE summary_data SET sensor_count =

                sensor_count + 1 WHERE summary_data.sensor_number = sensor_ident

      ELSE INSERT INTO summary_data VALUES ( sensor_ident, 1 )

      END IF;

  END

The logic in the above procedure is straightforward. The first step (line 14) is to insert the row of sensor data to the sensor_data table. The second step is to modify the summary table. A complication is to determine if the summary row for the sensor in question is extant; if so, the count for that row is incremented (line 17), otherwise a new row is inserted (line 19).

Warning: while the above code is straightforward, it is also wrong. The procedure's logic as written contains a race condition and will cause frequent deadlocks and/or incorrect results, depending on the isolation level being used. These details are not important for the thread deadlock case I'm trying to illustrate.

Setup for this example also requires the setup of the tables utilized by the TRANTEST utility to track request response times, by executing the trantabs.sql script in the samples/trantest directory. Setup also requires the following:



TRUNCATE TABLE summary_data;

TRUNCATE TABLE sensor_data;

SET OPTION "DBA".LOG_DEADLOCKS = "ON";

CALL SA_SERVER_OPTION( 'RememberLastStatement', 'Yes' );

We use the performance analysis utility TRANTEST from the SQL Anywhere samples to execute the procedure above from multiple client connections simultaneously. Here's the TRANTEST command line:



TRANTEST -a ESQL -c "uid=dba;pwd=sql" -f insert_sensor_data.sql -i 2 -k 5 -l 15 -m 0 -n 25 -o results.txt -w 0

In summary, TRANTEST will create 25 ESQL connections that will continuously call the script in "insert_sensor_data.sql" with zero think time at isolation level 2 for a total elapsed time of 15 seconds, issuing a COMMIT every 5 transactions. The "insert_sensor_data.sql" file contains the single line



CALL INSERT_SENSOR_DATA()

I chose 25 clients because I'm running an 11.0.1 server with the default multiprogramming level of 20.

Problem determination of thread deadlock

There are two ways one can determine if thread deadlock has occurred, and the set of connections and SQL requests that were involved. The first is using the built-in diagnostic procedure sa_report_deadlocks(), which is enabled via the LOG_DEADLOCKS option as documented above. Here is a portion of the result after executing the above example with TRANTEST:

Line 54-67 is the DBISQL window illustrate thread deadlock, where the first row (line 54) is the "victim" (the CALL statement was executing on the last non-blocked thread). The rows following indicate the status of other connections; sure enough, each of these is blocked while executing the INSERT_SENSOR_DATA() procedure. The rows returned by sa_report_deadlocks() details both the table (object 3358, the summary_data table, from the SYSOBJECTS catalog table) and the row identifier of the row in summary_data causing the block.

The reason behind the contention is straightfoward: because the procedure attempts to both read (line 16 in the procedure) and modify (line 17) rows in the summary_data table, multiple clients will block on each other. With more clients than available threads, batched COMMITs, and zero think time, thread deadlock is inevitable. Increasing -gn to a value higher than the number of clients will prevent occurrences of thread deadlock, but won't solve the underlying problem, which is serialization of the execution of the INSERT_SENSOR_DATA() procedure.

A second mechanism to discover the existence of thread deadlock is through SQL Anywhere's Application Profiling capabilities, available through Sybase Central. Starting the Application Profiling wizard, followed by executing TRANTEST, yields a tracing database that documents the execution of each SQL statement issued by any connection. Here is the summary page for the test, as displayed by Sybase Central:

2009/09/thread_deadlock_summary.png

Note the summary times for the UPDATE summary_data statement: a total time of 2910 milliseconds for 9300-odd statement invocations, but a maximum time of 213 milliseconds - a sure sign of excessive blocking. If one switches to the Details pane, the occurrences of thread deadlock become obvious:

From this detailed view, one can analyze the concurrently-executing statements at the point of each occurrence of thread deadlock to determine what each thread in the server was executing at that time, which will, of course, again point to the badly-written INSERT_SENSOR_DATA() stored procedure - and the UPDATE statement on the summary_data table in particular.