IBM Informix Dynamic Server (IDS) needs careful handling of data manipulation language (DML) operations affecting millions of rows. If performed in a single SQL statement on a logged database, the database engine must keep each affected row locked until it completes: even if no transaction has been started with BEGIN WORK, the statement still runs inside an implicit transaction.
Assuming the table has LOCK MODE set to ROW (the norm for OLTP) rather than PAGE (usually avoided due to concurrency problems), the result is millions of locks, particularly as they are required on each index as well as the table. The documentation for the LOCKS configuration parameter, which defines the initial size of the lock structure in the resident shared memory segment, states that each requires 100-200 bytes depending on the platform. Should it run out and need to allocate an extension lock structure in virtual shared memory, it may have to add new segments dynamically totalling gigabytes. Unless the Enterprise Edition is in use, this may cause user sessions to be starved of memory or even rejected.
Following a feature request by the author, configuration parameter SESSION_LIMIT_LOCKS is available from IDS 12.10.xC4 that you can use to prevent an excessive number of locks by any one session, but that will obviously result in the statement being aborted if the limit is reached.
Locks can be avoided entirely by placing an EXCLUSIVE lock on the table or by temporarily disabling logging on the database or altering the type to RAW, but this will most likely interfere with other users, break applications, and would invalidate any replication.
There is a second major problem with very large transactions: if so much data is affected that most of the logical log space has been consumed, depending on configuration parameter LTXHWM, the statement will encounter "Long Transaction Aborted". It can then take at least as long for the transaction to be rolled back, during which time the instance may be unusable if LTXEHWM has also been reached.
The right way to avoid this entirely is to split DML statements into smaller transaction affecting only a few thousand rows each at most. The two most common scenarios are when loading fresh data from a file or deleting a large number of rows. These can be safely achieved committing a few thousand rows per transaction with the dbload tool provided with IDS (see wrapper script at the end of this article) and the dbdelete open source ESQL-C program by Art Kagel (see also a Python version by Andrew Ford). Other scenarios may need to be specifically coded.
This article describes a stored procedure alternative to dbdelete based on it.
A complex archiving program may need to delete rows depending on the contents of a temporary table which is only visible to the current session. Compiling and deploying a C program may be inconvenient or prohibited. The SQL session may be running on a different machine where such a tool is not available or for which such a tool cannot easily be compiled, such as on Windows.
Whatever the reason, stored procedure sp_dbdelete listed after this article might be an attractive alternative. The available parameters are:
p_table VARCHAR(128), -- name of table containing rows to delete
p_select VARCHAR(255), -- WHERE clause or key value SELECT statement
p_column VARCHAR(128) DEFAULT 'ROWID', -- integer key column name
p_rows SMALLINT DEFAULT 100 -- rows per delete statement
You must supply the table name first. If it's not in the current database, specify with "database:table". If it's in a different instance, use "database@instance:table".
The set of rows to be deleted must also be defined as either:
- conditions within the table affected, starting with keyword WHERE (unlike dbdelete);
- a complete SELECT statement returning integer key values to be deleted.
Either can be upper or lower case. If the second form is used, the key column is assumed to be ROWID unless a column name is provided as a third parameter. Only columns of type INT, INT8, BIGINT, or their SERIAL equivalents are supported at this time. ROWID is always available unless the table has multiple partitions (aka fragments) as was created without.
The last fourth parameter will normally never need specifying. You can perform your own experiments, but 100 seems to be the smallest number of rows per DELETE that doesn't experience slower performance. Thousands are not possible due to limits on the maximum statement size possible with EXECUTE IMMEDIATE in Stored Procedure Language (SPL) which varies with version and is not well documented.
Return values are:
INT AS selected,
INT AS deleted,
INT AS seconds;
The first two will be the same unless some of the rows selected were no longer there at time of deletion, in which case "deleted" will be slightly less. This would be a concern if using ROWID, as the wrong rows could be deleted as a result on a volatile table.
The run time in seconds is also returned, calculated accurately by fetching the time from a system table before and after. Beware of using CURRENT which is a constant in SPL containing when the procedure was started.
Within SPL, when a SQL statement references an object without specifying the database, it is assumed to be in the same one as the procedure, not the database to which the calling session is connected. This is potentially inconvenient if you want just one copy of the procedure defined, but will be calling it from many application databases in that instance. This is solved in sp_dbdelete as follows:
- If the table name has been specified without a database prefix, the top-level database for the session is determined and applied.
- When supplying an entire SQL statement in the second parameter to fetch key values, prefix object names with "$dbname" if you want that replaced in the same manner.
Do not execute this procedure inside a transaction as that will negate the purpose!
See the DOCUMENT section at the end of the procedure listing for examples of usage.
Should the need arise, a very similar procedure for mass UPDATE could be coded with one extra parameter containing the SET clause.
Excessively large transactions are a menace. Tools exist to avoid them, and stored procedure sp_dbdelete is a useful addition to the arsenal so that millions of rows can be safely deleted in one operation inside any SQL session.
The above is provided "as is" without warranty of any kind, either express or implied, including without limitation any implied warranties of condition, uninterrupted use, merchantability, fitness for a particular purpose, or non-infringement.