Informix 14.10 Partial Indexes can save disk space

Abstract

Best practice has always been not to create indexes on highly duplicate data. Scanning the entire table for a common value may be quicker, and updating an index can be very costly when many pointers to rows with the same value are spread over several pages. A work-around when an index is essential in this scenario is to extend the index with a more selective column, but this obviously makes it bigger.

For the first time, a far better solution was provided in IDS 14.10.FC2. It’s known as Informix 14.10 Partial Indexes

Partial Indexes (click for the relevant page in the IBM Knowledge Center).

In this article, we will demonstrate how to identify where such indexes might be appropriate, how to create them, and how much smaller they can potentially be.

Content

A classic example is the index on the “status” column of the Sage Line 500 “opheadm” table which contains sales orders. 99% of rows will typically have the value “8” for “Invoiced” in the status column: if we wanted to retrieve all those rows, the query optimizer will probably find it quicker to scan the whole table without using any index. However, most of the time, we are looking for rows with one of the other values, for which we do need an index. The following simulates the scenario (the real table obviously has many more columns):

Copy to Clipboard

That returns:

status	number
1	1000
2	1000
3	1000
4	1000
5	1000
6	1000
7	1000
8	993000

The standard index has 2783 used pages, 2766 leaves, and 3 levels.

SQL to replace this with a partial index excluding invoiced rows is:

Copy to Clipboard

Stating the dbspace name before “INDEX OFF” is recommended as it’s necessary for mixed page sizes: see Caveats.
Partition names can be whatever you like instead of “part_0” and “part_1” within the normal Informix object name rules.

The new index has 24 used pages, 22 leaves, and 2 levels.

The following function helps you identify indexes that could be candidates, and generates SQL to replace them with partial indexes:

sp_partial_indexes.sql

Copy to Clipboard

DROP FUNCTION IF EXISTS sp_partial_indexes;

CREATE FUNCTION sp_partial_indexes
    (
        p_percent  SMALLFLOAT   DEFAULT 50,
        p_database VARCHAR(128) DEFAULT NULL
    )
    RETURNING VARCHAR(255) AS sql;

-- Generate SQL to recreate recommended partial indexes
    -- Doug Lawry, April 2020

-- Parameters:
    -- 1) Percentage of rows for a column value to be a candidate
    -- 2) Database if different to where the function is defined

-- Limitations:
    -- 1) Rename indexes implied by constraints first
    -- 2) Only examines single-column duplicate indexes

DEFINE  l_sql       LVARCHAR(1024);

DEFINE  l_idxname,
            l_tabname,
            l_colname,
            l_oldname,
            l_owner,
            l_dbspace,
            l_value     VARCHAR(128);

DEFINE  l_nrows,
            l_leaves,
            l_levels,
            l_minimum,
            l_fragments,
            l_count     INT8;

SET DEBUG FILE TO '/tmp/spldebug.out';
    TRACE ON;

IF p_database IS NULL THEN

SELECT TRIM(odb_dbname) -- main session database
        INTO p_database
        FROM sysmaster:sysopendb
        WHERE odb_sessionid = DBINFO('sessionid')
        AND odb_odbno = 0;

END IF

LET l_sql =
        ' SELECT ' || 'I.owner,' ||
        ' '        || 'I.idxname,' ||
        ' '        || 'T.tabname,' ||
        ' '        || 'C.colname,' ||
        ' '        || 'T.nrows,' ||
        ' '        || 'I.leaves,' ||
        ' '        || 'I.levels' ||
        ' FROM '   || p_database || ':sysindexes AS I' ||
        ' JOIN '   || p_database || ':systables  AS T' ||
        ' ON '     || 'T.tabid = I.tabid' ||
        ' JOIN '   || p_database || ':syscolumns AS C' ||
        ' ON '     || 'C.tabid = I.tabid' ||
        ' AND '    || 'C.colno = I.part1' ||
        ' WHERE '  || 'idxtype = ''D''' ||
        ' AND '    || 'I.part2 = 0';

PREPARE e_main FROM l_sql;
    DECLARE c_main CURSOR FOR e_main;
    OPEN    c_main;

WHILE 1 = 1

LET l_idxname = NULL;

FETCH c_main INTO
            l_owner,
            l_idxname,
            l_tabname,
            l_colname,
            l_nrows,
            l_leaves,
            l_levels;

IF l_idxname IS NULL THEN
            EXIT WHILE;
        END IF

SELECT  COUNT(*)
        INTO    l_fragments
        FROM    sysmaster:systabnames
        WHERE   dbsname = p_database
        AND     tabname = l_idxname;

IF l_fragments > 1 THEN
            CONTINUE WHILE; -- may already be a partial index
        END IF

LET l_minimum = l_nrows * p_percent / 100;

LET l_sql =
            'SELECT ' || l_colname  || ', COUNT(*)' ||
            ' FROM '  || p_database || ':' || l_tabname ||
            ' GROUP BY 1 ' ||
            ' HAVING COUNT(*) >= '  || l_minimum ||
            ' ORDER BY 1';

PREPARE e_data FROM l_sql;
        DECLARE c_data CURSOR FOR e_data;
        OPEN    c_data;

LET l_sql = NULL;

WHILE 1 = 1

LET l_count = null;

FETCH c_data INTO
                l_value,
                l_count;

IF l_count IS NULL THEN
                EXIT WHILE;
            END IF

IF l_sql IS NULL THEN

SELECT  MIN(TRIM(S.name))
                INTO    l_dbspace
                FROM    sysmaster:systabnames AS T,
                        sysmaster:sysdbspaces AS S
                WHERE   T.dbsname = p_database
                AND     T.tabname = l_idxname
                AND     S.dbsnum  = T.dbsnum;

IF l_idxname MATCHES ' *' THEN

LET l_oldname = TRIM(l_idxname);
                    LET l_idxname = 'ix_' || l_oldname;

RETURN
                        'RENAME INDEX' ||
                        ' '    || TRIM(l_oldname) ||
                        ' TO ' || TRIM(l_idxname) ||
                        '; -- system generated constraint index names begin with space'
                    WITH RESUME;

END IF

LET l_idxname = TRIM(l_idxname);
                LET l_owner   = TRIM(l_owner);

RETURN
                    'DROP INDEX' ||
                    ' ' || l_idxname || '; --' ||
                    ' ' || l_leaves  || ' leaves,' ||
                    ' ' || l_levels  || ' levels'
                WITH RESUME;

LET l_sql =
                    'CREATE INDEX' ||
                    ' '''  || l_owner ||
                    '''.'  || l_idxname ||
                    ' ON ' || l_tabname ||
                    ' ('   || l_colname || ')' ||
                    ' FRAGMENT BY EXPRESSION' ||
                    ' PARTITION part_0 (';

END IF

IF l_value IS NULL THEN

LET l_sql = l_sql || l_colname || ' IS NULL';

ELSE

IF l_sql MATCHES '* IS NULL' THEN
                    LET l_sql = l_sql || ' OR ';
                END IF

IF l_sql MATCHES '*''' THEN
                    LET l_sql = l_sql || ',';
                ELSE
                    LET l_sql = l_sql || l_colname || ' IN (';
                END IF

LET l_sql = l_sql || '''' || l_value || '''';

END IF

END WHILE

CLOSE   c_data;
        FREE    c_data;
        FREE    e_data;

IF l_sql IS NOT NULL THEN

IF l_sql MATCHES '*''' THEN
                LET l_sql = l_sql || ')';
            END IF

LET l_sql = l_sql ||
                ') IN ' || l_dbspace || ' INDEX OFF,' ||
                ' PARTITION part_1 REMAINDER' || 
                ' IN ' || l_dbspace || ' ONLINE;';

RETURN l_sql WITH RESUME;

END IF

END WHILE

CLOSE   c_main;
    FREE    c_main;
    FREE    e_main;

END FUNCTION;

It checks all duplicate indexes on single columns in a specified database to see if there are values accounting for more than a given percentage of rows in the table. For example, create and populate the example table as described in the IBM Knowledge Center Partial Indexes page, but with this index:

Copy to Clipboard

Then run the new function:

Copy to Clipboard

SQL generated:

Copy to Clipboard

The number of leaves and levels is shown so that you can decide which indexes are big enough to matter.

Run on the standard demo database:

Copy to Clipboard

SQL generated:

Copy to Clipboard

The extra first statement above will be some help in dealing with indexes implied by primary or foreign key constraints, whose actual name begins with a space to prevent alteration: see the RENAME INDEX documentation page.

However, CREATE INDEX still fails with error -350 “Index already exists on the column” as the index has not in fact been dropped but only hidden again. The following is a complete solution using more meaningful object names:

Copy to Clipboard

As you can see, the SQL generated is only a guide, and you may well need to edit the results, as well as experimenting with different thresholds.

Caveats

Tables comprised of multiple fragments (partitions) is part of the parallelisation features reserved for Enterprise Edition. Unfortunately, Informix 14.10 Partial Indexes have been implemented using the same FRAGMENT BY EXPRESSION (or PARTITION) syntax, and this is rejected on lower editions (except Developer) with error 26453 “Fragmentation is not supported in this edition of IDS”.

This article was updated on 18^th August 2020 with new information from Roland Wintgen who has opened a case with IBM. A problem occurs if the database or table is in a dbspace with a non-default page size. For example, in an instance with 2KB pages by default, if you create a 4KB page dbspace “data4kb” then a new database in it, the following simpler form of SQL for one of our examples produces an error:

Copy to Clipboard

You have to declare a dbspace name for both fragments in this situation. Perhaps it otherwise defaults to the root dbspace, causing the error. However, it still fails:

Copy to Clipboard

The solution is also to name the fragments (aka. partitions):

Copy to Clipboard

The stored procedure and examples in this article have been amended accordingly. If accepted as a bug by IBM, we will provide further updates here with the defect number and IDS version containing the fix when either are available. Meanwhile, the above is a full work-around.

Conclusion

If Informix 14.10 Partial Indexes are available in your version and edition, they may save you considerable disk space and make reads, inserts and deletes lighter.

Disclaimer

Suggestions above are provided “as is” without warranty of any kind, either express or implied, including without limitation any implied warranties of condition, uninterrupted use, merchantability, fitness for a particular purpose, or non-infringement.