Abstract

This relatively new feature, documented here, was introduced with IBM Informix Dynamic Server (IDS) version 12.10.xC11 in March 2018, enabling replication of tables within the same database or instance. It is an enhancement to Enterprise Replication (ER) which was previously only supported between different instances. This article gives an example of how it can reduce downtime when slow schema changes are required on large tables in continuous use.

Content

Links in this article not shown in full are for pages within the new IBM Documentation site:

https://www.ibm.com/docs/en/informix-servers/14.10

Included ER “cdr” (continuous data replication) commands are clickable documentation links and not described in detail for brevity.

Existing relevant articles and demonstrations are here:

https://github.com/nagaraju-inturi/loopback-replication

https://databasemusing.wordpress.com/2020/06/15/loopback

Informix Tech Talks: Informix Large Table Operations by Ben Thompson

The test system used for this article is an IDS 14.10.FC5 Developer Edition Linux Docker container created with the image available from:

https://hub.docker.com/r/ibmcom/informix-developer-database

If you are not already using ER, you will need two new dbspaces – with your preferred names – to hold the ER “syscdr” database and smart blobs which contain transaction contents, as was done for this article in the Docker container as follows:

eval cd $(dirname $(onstat -c ROOTPATH))
(umask 006 ; touch cdrdbs.000 cdrsbs.000)

onspaces -c -d cdrdbs -p $(pwd)/cdrdbs.000 -o 0 -s 204800
onspaces -c -S cdrsbs -p $(pwd)/cdrsbs.000 -o 0 -s 512000 -Df "AVG_LO_SIZE=2,LOGGING=ON"

The sizes above are the defaults used by “cdr migrate server“. You may wish to reduce the estimated average blob size to the minimum of 2KB (default 32KB) to avoid running out of metadata space, and/or enable logging if part of a high-availability cluster (default OFF), by appending one or both of the qualifiers shown above in blue.

Corresponding configuration parameters then need updating, together with two others as usually recommended:

cdr change config "CDR_DBSPACE cdrdbs"
cdr change config "CDR_QDATA_SBSPACE cdrsbs"
cdr change config "CDR_QUEUEMEM 262144"
cdr change config "CDR_SUPPRESS_ATSRISWARN 1,2,3"

There is another less well documented and non-essential parameter:

$ onstat -g cfg full CDR_QHDR_DBSPACE

IBM Informix Dynamic Server Version 14.10.FC5X4DE -- On-Line -- Up 1 days 03:26:59 -- 472196 Kbytes

Configuration Parameter Info

id   name                      type    maxlen   units   rsvd  tunable
157  CDR_QHDR_DBSPACE          CHAR    129

     default : rootdbs
     onconfig:
     current :

     Description:
     Specifies the location of the dbspace that Enterprise Replication
     uses to store the transaction record headers spooled from the send
     and receive queues.

     The CDR_QHDR_DBSPACE configuration parameter specifies the
     location of the dbspace that Enterprise Replication uses to store
     the transaction record headers spooled from the send and receive
     queues. By default, Enterprise Replication stores the transaction
     record headers in the root dbspace. For more information, see
     Transaction Record dbspace.

     Restriction: Do not change the value of CDR_QHDR_DBSPACE after you
     initialize Enterprise Replication.

If CDR_DBSPACE is set, the default for the above is in fact that and not “rootdbs” as stated. Specifically, setting CDR_QHDR_DBSPACE specifies the dbspace for the following “syscdr” tables (verified by testing):

blkdelete
control_send_stxn
trg_receive_stxn
trg_send_srep
trg_send_stxn

The next prerequisite step is to add a loopback connector (red) and ER group information (blue) in the “sqlhosts” file as described here and in the following example:

g_informix    group       -             -       i=1
informix      onsoctcp    *localhost    9088    g=g_informix

g_loopback    group       -             -       i=2
loopback      onsoctcp    *localhost    9090    g=g_loopback

Each asterisk instructs IDS to listen on all IP addresses available when started, which is not an ER requirement, but is often useful.

The $ONCONFIG file should be updated with the additional alias, for example:

DBSERVERNAME informix
DBSERVERALIASES loopback

A new listener thread can then be started and tested with:

onmode -P start loopback
echo | INFORMIXSERVER=loopback dbaccess sysmaster

ER can now be initiated with:

cdr define server -I g_informix
cdr define server -I g_loopback -S g_informix

The “syscdr” database will now have been created in the dbspace specified by CDR_DBSPACE.

We can now experiment with loopback replication. The following table was created in empty database “test”:

CREATE TABLE test_table
(
    serial_no   SERIAL        NOT NULL  PRIMARY KEY,
    batch_name  VARCHAR(255)  NOT NULL,
    batch_row   SMALLINT      NOT NULL,

    UNIQUE (batch_name, batch_row),
    CHECK  (batch_row > 0)
);

For brevity, the above does not follow best practice of explicitly named indexes and constraints.

Sample data was inserted with:

CREATE PROCEDURE sp_test_data(nrows SMALLINT)

    -- Much quicker in a transaction to reduce log buffer flushing.
    -- Assumes few enough rows to avoid "Long Transaction Aborted".
    -- Table exclusively locked to prevent lock structure growth.

    DEFINE n LIKE test_table.batch_name;
    DEFINE r SMALLINT;

    LET n = CURRENT YEAR TO SECOND;

    BEGIN WORK;

    LOCK TABLE test_table IN EXCLUSIVE MODE;

    FOR r = 1 TO nrows
        INSERT INTO test_table VALUES (0, n, r);
    END FOR

    COMMIT WORK;

END PROCEDURE;

EXECUTE PROCEDURE sp_test_data(4000);

Imagine that “batch_name” is in fact being populated with DATETIME data by applications. We therefore want to alter the VARCHAR column to a more appropriate data type:

ALTER TABLE test_table MODIFY
(
    batch_name  DATETIME YEAR TO SECOND  NOT NULL
);

However, by running the following SQL before and after that on a test system, we see that the “partnum” has changed, indicating that this is a slow alter and not an in-place alter:

SELECT partnum FROM systables WHERE tabname = 'test_table';

Note this relevant Request for Enhancement (RFE):

Add onconfig parameter to not allow slow alters

https://ibm-data-and-ai.ideas.aha.io/ideas/INFX-I-393

We therefore plan to reduce downtime on the live system by creating a replica table but with the improved data type in blue:

CREATE TABLE test_table_new
(
    serial_no   SERIAL                   NOT NULL  PRIMARY KEY,
    batch_name  DATETIME YEAR TO SECOND  NOT NULL,
    batch_row   SMALLINT                 NOT NULL,

    UNIQUE (batch_name, batch_row),
    CHECK  (batch_row > 0)
);

Copying existing contents before setting up replication is quicker and lighter. In the example scenario, we have HDR and/or RSS replica instances. The new table must therefore be logged (not RAW) so that contents are propagated to secondary instances. We must therefore avoid “long transaction aborted” by splitting the operation into a few thousand rows per commit. If you have it or can compile it, the easiest method is to use “dbcopy” from Art Kagel’s open source ESQL-C “utils2_ak”:

https://www.askdbmgt.com/my-utilities.html

A solution would then be a shell command in this form:

dbcopy -d test -t test_table -T test_table_new -F -a -w 60

You could UNLOAD the table in “dbaccess” – or copy to an external table which is quicker – and then use the standard IDS “dbload” tool to break the job into transactions of a few thousand rows each, but that requires space in a file system, so might not be convenient. You might be able to get round that with FIFO devices, but that’s more complex.

Alternatively, write a stored procedure such as this:

CREATE FUNCTION sp_test_copy()
    RETURNING BIGINT AS rows_inserted;

    -- Only a few thousand rows should be committed per transaction:
    -- singleton inserts are too slow as flushes log buffer each time
    -- but must commit regularly to avoid "Long Transaction Aborted".
    -- Table is exclusively locked to further improve performance.

    DEFINE v_serial_no   LIKE test_table_new.serial_no;
    DEFINE v_batch_name  LIKE test_table_new.batch_name;
    DEFINE v_batch_row   LIKE test_table_new.batch_row;
    DEFINE v_inserted    BIGINT;

    LET v_inserted = 0;

    BEGIN WORK;
    LOCK TABLE test_table_new IN EXCLUSIVE MODE;

    FOREACH
    WITH HOLD -- otherwise COMMIT closes cursor

        SELECT *
        INTO v_serial_no, v_batch_name, v_batch_row
        FROM test_table

        INSERT INTO test_table_new
        VALUES (v_serial_no, v_batch_name, v_batch_row);

        LET v_inserted = v_inserted + 1;

        IF MOD(v_inserted, 2000) = 0 THEN
            COMMIT WORK;
            BEGIN WORK;
            LOCK TABLE test_table_new IN EXCLUSIVE MODE;
        END IF

    END FOREACH;

    COMMIT WORK;

    RETURN v_inserted;

END FUNCTION;

Bulk data loading is always much quicker with indexes and constraints disabled, and the resulting indexes will be more efficient, so the best process is therefore:

SET CONSTRAINTS FOR test_table_new DISABLED;
SET INDEXES     FOR test_table_new DISABLED;

TRUNCATE test_table_new;         -- if rerunning
EXECUTE FUNCTION sp_test_copy(); -- or use dbcopy

SET CONSTRAINTS FOR test_table_new ENABLED;
SET INDEXES     FOR test_table_new ENABLED;

Note that index builds, which will be performed by the last statement above, are faster and more stable with sorting in file systems (instead of temp dbspaces) specified in environment variable PSORT_DBTEMP. Furthermore, see:

https://www.oninitgroup.com/faq-items/ram-disk-and-informix

We are now ready to enable replication between the two tables with this shell script:

set -e # exit on error

DB=test         # database name
TO=informix     # table owner
TN=test_table   # table name
G1=g_informix   # groups in sqlhosts
G2=g_loopback
T1=$TO.$TN      # owner + table names
T2=$T1\_new
RP=repl_$TN     # replicate name

cdr define replicate $RP -C always -S row -A -R \
    "P $DB@$G1:$T1" "select * from $T1" \
    "R $DB@$G2:$T2" "select * from $T2"

cdr start replicate $RP

CT=check_$(date +%Y%m%d_%H%M%S)
echo "Starting task $CT"
cdr check replicate -r $RP -m $G1 -a -R -n $CT

Parameters used to define the replicate:

ShortLongDescription

-C always

--conflict=always

"Use the always option if you do not want Enterprise Replication to resolve conflicts, but you do want replicated changes to be applied even if the operations are not the same on the source and target servers. Use the always-apply conflict resolution rule only with a primary-target replication system."

This setting is appropriate for one-way replication. Must be specified.

-S row

--scope=row

"Evaluate one row at a time and apply the replicated rows that win the conflict resolution with the target rows."

This setting is appropriate for one-way replication. Default is "transaction".

-A

--ats

"Activates aborted transaction spooling for replicate transactions that fail to be applied to the target database."

Useful for troubleshooting, unlikely to occur with one-way replication.

-R

--ris

"Activates row-information spooling for replicate row data that fails conflict resolution or encounters replication order problems."

Useful for troubleshooting, unlikely to occur with one-way replication.

"P ..."

[participant modifier]

"For primary-target replicates, specifies that the participant is a primary participant, which both sends and receives replicated data."

Required for loopback.

"R ..."

[participant modifier]

"For primary-target replicates, specifies that the participant is a receive-only target participant, which only receives data from primary participants."

Required for loopback.


Parameters used to check the replicate:

Short

Long

Description

-r repl

--repl=repl

"Specifies the name of the replicate to check."

Replicate must be specified.

-m

--master

"Specifies the database server to use as the reference copy of the data."

Source must be specified in some way and this is the simplest.

-a

--all

"Specifies that all servers defined for the replicate are checked."

Target must be specified in some way and this is the simplest.

-R

--repair

"Specifies that rows that are found to be inconsistent are repaired."

This is the purpose of the check command in our script.

-n task

--name=task

"Specifies that the progress of this command can be monitored. Information about the operation is stored under the specified progress report task name on the server on which the command was run."

Cannot easily be monitored without the check task being named.


Output looks like this:

Interpreting this replicate as a master replicate.
Verification of test@g_informix:informix.test_table started
Verification of test@g_informix:informix.test_table is successful
Verification of test@g_loopback:informix.test_table_new started
Verification of test@g_loopback:informix.test_table_new is successful
Starting task check_20210416_121945

Apr 16 2021 12:20:55 ------   Table scan for repl_test_table start  --------

Node                  Rows     Extra   Missing  Mismatch Processed
---------------- --------- --------- --------- --------- ---------
g_informix            4000         0         0         0         0
g_loopback            4000         0         0         0         0

Apr 16 2021 12:21:01 ------   Table scan for repl_test_table end   ---------

You could watch the progress of the last stage (updating every 10 seconds) on another screen with:

cdr stats check -r 10 check_20210416_121945

Alternatively, on Linux, this might be easier on the eye:

watch -d -n 10 "cdr stats check check_20210416_121945"

You can list previous check tasks with:

cdr stats check

Example output is:

Task check_20210415_185458
repl_test_table          Completed
                         Started Apr 15 18:56:57, Elapsed Time 0:00:02
Task check_20210415_191142
repl_test_table          Completed
                         Started Apr 15 19:13:39, Elapsed Time 0:00:03
Task check_20210416_121945
repl_test_table          Completed
                         Started Apr 16 12:20:59, Elapsed Time 0:00:01

You can delete the record of a previous check task with a command such as:

cdr stats check --delete check_20210416_121945"

At an agreed time, when ready to go live with the new table, you should check there is no replication backlog with:

onstat -g rqm brief

The cut-over steps would then be:

  1. Disconnect applications using the old table.
  2. Stop loopback replication of that table.
  3. Rename tables so that the new one will be used.
  4. Reconnect applications.

If we were to use admin mode – in which only user “informix” (by default) can be connected – as a means to kick users off for step 1, and if they automatically reconnect when allowed, a script for the whole process would be:

onmode -jy # admin mode

cdr delete replicate repl_test_table

dbaccess test <<EOF
RENAME TABLE test_table TO test_table_old;
RENAME TABLE test_table_new TO test_table;
EOF
onmode -m # on-line mode

Rolling back the change would be achieved by renaming the tables back to how they were. The old table should be dropped once you know it is no longer needed.

ER could subsequently be stopped completely and the “syscdr” database dropped with:

cdr delete server g_loopback
cdr delete server g_informix

Caveats

IDS 12.10.xC11 or later is required for loopback replication.

ER generally has these prerequisites:

  1. The database must have logging enabled with “ontape“, “ondblog“, or “alter logmode“.
  2. Tables being replicated must be logged (cannot be RAW).
  3. A unique index or ERKEY columns must exist for each replicated table.
  4. Disk space is required for mandatory ER dbspaces if they need creating.

Conclusion

With those provisos, down time associated with a slow table alteration can be reduced to a few seconds using this technique. Everything you need is given in this article, though you may want to review general ER documentation for best tuning.

Disclaimer

Suggestions above are provided “as is” without warranty of any kind, either express or implied, including without limitation any implied warranties of condition, uninterrupted use, merchantability, fitness for a particular purpose, or non-infringement.