Informix replication to SPL routine

Updated December 2021

Abstract

This new feature, documented here, was introduced with IBM Informix Dynamic Server (IDS) version 14.10. It is an enhancement to Enterprise Replication (ER) allowing for data changes to be handled by a stored procedure rather than being applied directly to a replica table. It is very likely to be used in a Loopback Replication arrangement, so that the stored procedure fired is contained in the same database as the source table.

Such a replicate is also known as an “asynchronous post-commit trigger”. This is because, unlike a conventional trigger, it runs in background with respect to the calling application and occurs after the transaction has been committed. ER infrastructure should ensure reliable buffering and delivery should a backlog build up or if the server is restarted.

This article gives an example of how this might be useful to perform whatever type of action you wish when data changes have been committed.

Content

Links in this article not shown in full are for pages within the new IBM Documentation site:

https://www.ibm.com/docs/en/informix-servers/14.10

Included ER “cdr” (continuous data replication) commands are clickable documentation links and not described in detail for brevity.

Relevant material is here:

https://github.com/nagaraju-inturi/informix-async-postcommit-triggers

The test system used for this article is an IDS 14.10.FC5 Developer Edition Linux Docker container created with the image available from:

https://hub.docker.com/r/ibmcom/informix-developer-database

See also our article on Loopback Replication which contains pre-requisite instructions.

The documentation page for Replication to SPL routine describes two alternate usages that require target procedures with very different parameters:

one per replicated column plus additional control columns;
a single JSON document.

Simple tests of both follow, using the “stores_demo” database provided with IDS and installed with the “dbaccessdemo” command. In both cases, changes to a table will be written to a staging/audit table.

Example SQL to set up database objects for the first approach is:

CREATE TABLE state_log
(
    op_type     CHAR(1),
    source_id   INT,
    commit_time INT,
    txn_id      BIGINT,
    userid      INT, 
    session_id  INT,
    old_code    CHAR(2),
    old_sname   CHAR(15),
    new_code    CHAR(2),
    new_sname   CHAR(15)
);

CREATE PROCEDURE sp_state_log
(
    op_type     CHAR(1),
    source_id   INT,
    commit_time INT,
    txn_id      BIGINT,
    userid      INT, 
    session_id  INT,
    old_code    CHAR(2),
    old_sname   CHAR(15),
    new_code    CHAR(2),
    new_sname   CHAR(15)
)

INSERT INTO state_log VALUES
(
    op_type,
    source_id,
    commit_time,
    txn_id,
    userid, 
    session_id,
    old_code,
    old_sname,
    new_code,
    new_sname
);

END PROCEDURE;

Note that “userid” and “session_id” were added in IDS 14.10.FC6 and should be removed from the above for earlier versions. The rest of this article has example output from FC5 but requires no code changes.

That’s rather laborious: the control columns have to be listed 3 times and the columns in the original table 6 times.
Parameter types must conform to the documentation/schema and be in the right order, whereas their names are up to you.

Shell commands to define and start the replicate are:

cdr define replicate repl_state -C always -S row -A -R \
    --splname=sp_state_log \
    "P stores_demo@g_informix:informix.state" "select * from informix.state" \
    "R stores_demo@g_loopback:informix.state" "select * from informix.state"

cdr start replicate repl_state

See our Loopback Replication article for a description of the standard options chosen above.

SQL for a basic test is:

INSERT INTO state VALUES (53, 'GB');
UPDATE state SET sname = 'UK' WHERE code = 53;
DELETE FROM state WHERE code = 53;
SELECT * FROM state_log;

Results were:

op_type	source_id	commit_time	txn_id	old_code	old_sname	new_code	new_sname
I	1	1621534543	12124707279076			53	GB
U	1	1621534543	12124707287288	53	GB	53	UK
D	1	1621534543	12124707295776			53	UK

Recommended SQL statements to set up database objects for the second approach are:

CREATE TABLE data_change_log (data BSON);

CREATE INDEX data_change_log ON data_change_log
    (BSON_VALUE_BIGINT(data, 'commit_time')) USING BSON;

CREATE PROCEDURE sp_data_change_capture (data JSON)
    INSERT INTO data_change_log VALUES (data);
END PROCEDURE;

That’s massively simpler: column/parameter names do not need specifying.
The same log table and procedure can be used for multiple replicates (the data contains the table name).
BSON has been chosen for the table column type (the INSERT recasts it) which has functions we need.
We can then index “commit_time” as shown, which might be necessary should the table became sizeable.

Shell commands to define and start the replication differ only on line 2:

cdr define replicate repl_state -C always -S row -A -R \
    --jsonsplname=sp_data_change_capture \
    "P stores_demo@g_informix:informix.state" "select * from informix.state" \
    "R stores_demo@g_loopback:informix.state" "select * from informix.state"

cdr start replicate repl_state

Option “–splname” has been replaced with “–jsonsplname”.
A shared stored procedure can be used for all source tables.

After rerunning the same simple test as before, the data recorded can be shown with spaces between JSON elements for better readability with word-wrapping via this trick (which would not affect how the JSON document behaves):

SELECT REPLACE(data::JSON::LVARCHAR, ',"', ', "') AS data
FROM data_change_log;

data
{"operation":"insert", "table":"state", "owner":"informix", "database":"stores_demo", "txnid":12124695617764, "commit_time":1621529490, "rowdata":{"code":"53", "sname":"GB"}}
{"operation":"update", "table":"state", "owner":"informix", "database":"stores_demo", "txnid":12124695625976, "commit_time":1621529490, "rowdata":{"code":"53", "sname":"UK"}, "before_rowdata":{"code":"53", "sname":"GB"}}
{"operation":"delete", "table":"state", "owner":"informix", "database":"stores_demo", "txnid":12124695638244, "commit_time":1621529490, "rowdata":{"code":"53", "sname":"UK"}}

The flexibility of the BSON data is such that we could write general purpose code applying to any number of tables. For example, we might want a report showing only columns updated during a specific date range as recorded in the staging/audit table:

CREATE FUNCTION sp_data_change_diff
(
    p_tabname       VARCHAR(128) DEFAULT '*',
    p_datetime_1    DATETIME YEAR TO SECOND DEFAULT NULL,
    p_datetime_2    DATETIME YEAR TO SECOND DEFAULT NULL
)
RETURNING
    DATETIME YEAR TO SECOND     AS date_time,
    VARCHAR(128)                AS table_name,
    VARCHAR(255)                AS key_value,
    VARCHAR(128)                AS column_name,
    VARCHAR(255)                AS value_before,
    VARCHAR(255)                AS value_after;
{
    Show columns updated in JSON loopback replication output
    See also https://en.wikipedia.org/wiki/Year_2038_problem
    Doug Lawry, Oninit Consulting, May 2021

    Examples usage:

    EXECUTE FUNCTION sp_data_change_diff();
    EXECUTE FUNCTION sp_data_change_diff('customer');
    EXECUTE FUNCTION sp_data_change_diff
    (
        '*',
        '2021-05-19 12:27:00',
        '2021-05-19 12:29:00'
    );
}
    DEFINE l_data       BSON;
    DEFINE l_tabname    VARCHAR(128);
    DEFINE l_utc_time   BIGINT;
    DEFINE l_datetime   DATETIME YEAR TO SECOND;
    DEFINE l_colno      SMALLINT;
    DEFINE l_colname    VARCHAR(128);
    DEFINE l_key_value  VARCHAR(255);
    DEFINE l_value_1    VARCHAR(255);
    DEFINE l_value_2    VARCHAR(255);

    FOREACH

        SELECT  data
        INTO    l_data
        FROM    data_change_log
        WHERE   BSON_VALUE_VARCHAR (data, 'table') MATCHES p_tabname
        AND     BSON_VALUE_VARCHAR (data, 'operation') = 'update'
        AND     BSON_VALUE_BIGINT  (data, 'commit_time')
        BETWEEN sp_datetime_to_utc (NVL(p_datetime_1, MDY(01,02,1970)))
        AND     sp_datetime_to_utc (NVL(p_datetime_2, MDY(01,18,2038)))

        LET l_tabname  = BSON_VALUE_VARCHAR (l_data, 'table');
        LET l_utc_time = BSON_VALUE_BIGINT  (l_data, 'commit_time');
        LET l_datetime = DBINFO ('UTC_TO_DATETIME', l_utc_time);

        FOREACH

            SELECT colno, colname
            INTO l_colno, l_colname
            FROM systables AS t
            JOIN syscolumns AS c ON c.tabid = t.tabid
            WHERE tabname = l_tabname
            ORDER BY colno

            LET l_value_1 = BSON_VALUE_VARCHAR (l_data, 'before_rowdata.' || l_colname);
            LET l_value_2 = BSON_VALUE_VARCHAR (l_data,        'rowdata.' || l_colname);

            IF l_colno = 1 THEN
                LET l_key_value = l_value_1;
            END IF

            IF NVL(l_value_1,'')
            != NVL(l_value_2,'')
            THEN
                RETURN
                    l_datetime,
                    l_tabname,
                    l_key_value,
                    l_colname,
                    l_value_1,
                    l_value_2
                WITH RESUME;
            END IF

        END FOREACH

    END FOREACH

END FUNCTION;

That calls one other SPL function:

CREATE FUNCTION sp_datetime_to_utc
(
    date_time DATETIME YEAR TO SECOND
    DEFAULT   CURRENT  YEAR TO SECOND
)
RETURNING
    BIGINT AS utc_time;
{
    Inverse of DBINFO('utc_to_datetime',INT)
    Doug Lawry, Oninit Consulting, May 2021

    Check same result to test correctness:

        SELECT
            DBINFO('utc_current'),
            sp_datetime_to_utc()
        FROM systables
        WHERE tabid = 1;

    Maximum safe range with up to 24 hours TZ offset:

        '1970-01-02 00:00:00' -- zero is 1970-01-01
        '2038-01-18 00:00:00' -- Google "Year 2038"

    This function will cope after that using BIGINT,
    but DBINFO('utc_to_datetime',INT) will need fixing.
}
    DEFINE t INTERVAL DAY(7) TO SECOND;
    DEFINE d INTERVAL DAY(7) TO DAY;
    DEFINE s INTERVAL SECOND(7) TO SECOND;

    -- Time since UTC start
    LET t = date_time - DBINFO('utc_to_datetime', 0);

    -- Split days/seconds as max INTERVAL scale is 9:
    LET d = t;
    LET s = t - d;

    -- Recombine total seconds, allowed via CHAR:
    RETURN
        d :: CHAR(9) * 24 * 60 * 60 +
        s :: CHAR(9);

END FUNCTION;

IDS has built-in function DBINFO(‘utc_to_datetime’,INT) but not DBINFO(‘datetime_to_utc’,DATETIME).

There is a feature request for this submitted in 2015:

https://ibm-data-and-ai.ideas.aha.io/ideas/INFX-I-139

Searching for an SPL solution shows Jonathon Leffler’s is the most shared:

https://stackoverflow.com/questions/14187569/convert-datetime-to-unix-epoch-in-informix

However, that doesn’t handle time zones correctly, and our version above is simpler.

To test “spdata_change_diff”, two similar JSON replicates were created using a convenient shell script:

$ cat repl_define.sh
[ $# -lt 2 ] && exec echo "Usage: $0 database table-names"

set -e # exit on error

DB=$1 ; shift   # database name
TO=informix     # table owner
G1=g_informix   # groups in sqlhosts
G2=g_loopback

for TN # table name
do

    RP=repl_$TN # replicate name
    T1=$TO.$TN  # owner + table names
    T2=$T1

    cdr define replicate $RP -C always -S row -A -R \
        --jsonsplname=sp_data_change_capture \
        "P $DB@$G1:$T1" "select * from $T1" \
        "R $DB@$G2:$T2" "select * from $T2"

    cdr start replicate $RP

done

$ repl_define.sh stores_demo customer stock
Interpreting this replicate as a master replicate.
Verification of stores_demo@g_informix:informix.customer started
Verification of stores_demo@g_informix:informix.customer is successful
Verification of stores_demo@g_loopback:informix.customer started
Verification of stores_demo@g_loopback:informix.customer is successful
Interpreting this replicate as a master replicate.
Verification of stores_demo@g_informix:informix.stock started
Verification of stores_demo@g_informix:informix.stock is successful
Verification of stores_demo@g_loopback:informix.stock started
Verification of stores_demo@g_loopback:informix.stock is successful

SQL statements to clear the staging/audit table and run suitable test transactions were:

TRUNCATE data_change_log;

UPDATE  customer
SET     city = 'Sunny Vale'
WHERE   city = 'Sunnyvale';

UPDATE  customer
SET     city = 'Sunnyvale'
WHERE   city = 'Sunny Vale';

UPDATE  stock
SET     description = REPLACE(description, 'tires', 'tyres')
WHERE   description MATCHES '* tires';

UPDATE  stock
SET     description = REPLACE(description, 'tyres', 'tires')
WHERE   description MATCHES '* tyres';

We can produce a report across all the data changes captured with:

EXECUTE FUNCTION sp_data_change_diff();

That returns:

date_time	table_name	key_value	column_name	value_before	value_after
21/05/2021 18:14:29	customer	101	city	Sunnyvale	Sunny Vale
21/05/2021 18:14:29	customer	109	city	Sunnyvale	Sunny Vale
21/05/2021 18:14:29	customer	111	city	Sunnyvale	Sunny Vale
21/05/2021 18:14:29	customer	101	city	Sunny Vale	Sunnyvale
21/05/2021 18:14:29	customer	109	city	Sunny Vale	Sunnyvale
21/05/2021 18:14:29	customer	111	city	Sunny Vale	Sunnyvale
21/05/2021 18:14:29	stock	101	description	bicycle tires	bicycle tyres
21/05/2021 18:14:29	stock	101	description	bicycle tires	bicycle tyres
21/05/2021 18:14:29	stock	101	description	bicycle tyres	bicycle tires
21/05/2021 18:14:29	stock	101	description	bicycle tyres	bicycle tires

Parameters can be given to match for specific tables and/or a date range: see comments in code above.

Should you need to match for multiple table names, that can be done by treating the result set as a table:

SELECT * FROM TABLE (FUNCTION sp_data_change_diff())
AS data (date_time, table_name, key_value, column_name, value_before, value_after)
WHERE table_name IN ('customer', 'stock');

The same results are returned in this case.

Caveats

IDS 14.10 is required. See other prerequisites in our Loopback Replication article.

Limitations restated from the documentation are:

The target table must exist even if otherwise unused.
Ensure not to combine participant definitions that include both table and SPL routine as the target.
Out-of-row data datatypes like text, byte, blob, clob are not supported.

We raised two support cases which now have defect numbers:

IT37197: cdr list catalog is not showing ‘replication to SPL routine’ which was defined in cdr define repl

IT36450: –jsonsplname of “cdr repl” with CHAR/VCHAR column replicated containing single/double quote JSON won’t be constructed correctly.

The first means you cannot easily find out what procedure name was given (also not included in “cdr list repl” output) or correctly regenerate your “cdr define repl” commands. The second is self-explanatory and is a show-stopper if your data is affected.

Conclusion

Although the “replication to SPL routine” feature seems still to be a work-in-progress as at IDS 14.10.xC5, it is very flexible and could be a unique solution to unusual problems. Other use cases described elsewhere include real-time analytics such as aggregates or leader boards.

Disclaimer

Suggestions above are provided “as is” without warranty of any kind, either express or implied, including without limitation any implied warranties of condition, uninterrupted use, merchantability, fitness for a particular purpose, or non-infringement.

Author

Doug Lawry
Senior Informix Consultant