This article provides a script to analyse what database objects were locked when IBM Informix Dynamic Server (IDS) “asserts” (saves an evidence file) including what sessions were waiting on another, which is particularly useful to diagnose deadlocks.
In an actual recent case, an overnight batch job was sometimes failing with the following SQL and ISAM error numbers:
In order to ascertain what other session was causing the problem, we set a trap for any deadlock with “onmode -I -143” as documented here.
The $ONCONFIG file contained “DUMPSHMEM 0” so that a shared memory dump would not be written, but only an assert file (AF) and stack trace (with file name extension “.rawstk”).
We then waited until the next night, when an alarm program email was received and this was written into the message log (names have been anonymised in italics throughout this article):
An AF contains a section listing locks produced with “onstat -k” (which can also be run directly) with the output format described here. That section is located and aggregated by the following shell script:
It was run as follows:
This saved two files in the current directory:
The SQL in the shell script shows that the columns each contains are:
The first character of the lock category constructed by the script is one of these letters:
(K for “key” is used so as not to be confused with I for “Intent”.)
The type from “onstat -k” is then appended but with “HDR+” suppressed. The documentation lists the possible combinations as:
Note that stage 1 of our shell script skips Intent locks which are rarely useful. If you do need them, comment out these lines:
The results from two stages are saved so that file 1 can be used when analysing an AF off-line or file 2 when on-line with “partnum” values decoded into database and table names. Note that two other solutions exist to perform this decoding:
- Upgrade to IDS 14.10.FC6 or above which has an extra “table_name” column added to “onstat -k” output;
- Compile the “partn” open source C program from IIUG which can append objects names to a listing.
The second file also has “sysmaster” database objects excluded. In the actual case, this was then read into Excel and:
- column headings added in bold;
- Excel table inserted;
- rows resorted using column controls;
- index locks hidden using the “type” column control;
- borders specified.
The result was:
Both “owner” and “waiter” are hexadecimal user thread addresses. You can look for these in the first column of “onstat -u” output (described here) either in an AF or at the command line. In the actual case, relevant lines were:
The entire details for each of those session IDs in column 3 above could then be found by searching for them in the “onstat -g ses 0” section. That enabled us to determine that an entirely different application was unexpectedly using the same stored procedure concurrently. The solution was to reschedule jobs to run at different times, though recoding the order of statements in the procedure might also have worked.
Access at the command line as a privileged user – typically “informix” – is required.
With that proviso, the script given in this article is all you need to extract details of problem locks in assert files or “onstat -k” output.
Suggestions above are provided “as is” without warranty of any kind, either express or implied, including without limitation any implied warranties of condition, uninterrupted use, merchantability, fitness for a particular purpose, or non-infringement.