Abstract
This third article in the Easy IWA series provides specific worked examples of creating two Data Marts under IBM Informix Dynamic Server and IBM Informix Warehouse Accelerator 12.10.FC4. Both examples use the “smart mart” utility, with the first example loading from external tables into IBM Informix tables and subsequently into an IWA Data Mart; the second loads directly into an IWA Data Mart from external tables. Both examples use the TPCH data generation utility and schema as detailed in a previous article, with the addition of the removal of unsuitable “*_comment” columns via a simple in-line pipe and sed or awk approach.
Content
The creation and loading of a Data Mart under IBM Informix can at first appear convoluted; however, it is a straight forward set of activities that can be placed in a single, repeatable SQL script.
The process described here is based on the following premises (the scripts are written so that they can be re-run; the first run will report an error regarding the dropping of the source database and the dropping of the Data Mart):
An IBM Informix instance with an IWA Accelerator (JJDWA1) is present and running
No databases exist for the source of data for the Accelerator (“jj_tpch_internal” and “jj_tpch_external”)
Data will be generated using the TPCH data generation utility with redundant “*_comment” column data removed in-line (jj_tpch_data_stripped.sh)
The scripts for this article can be found in the resources section at the end of this article
If using Scale Factor 1, then both Data Marts are able to loaded, but this is running close to the limit imposed by having 2,560 MB RAM allocated to the WORKERS and 512 allocated to the CO-ORDINATOR
Two approaches will be described:
- Creation and loading a Data Mart from IBM Informix standard tables (jj_tpch_internal_stripped.sql)
- Creation of a TPCH database schema (excluding “*_comment” columns) and defining external tables defined as pipes based on the standard tables
- Loading IBM Informix tables with TPCH data from the external tables using the TPCH data generation utility (jj_tpch_data_stripped.sh)
- Loading the IWA Data Mart from IBM Informix tables
- Creation and loading a Data Mart from IBM Informix external tables (jj_tpch_external_stripped.sql)
- Creation of a TPCH database schema (excludes “*_comment” columns) and defining external tables based on the standard tables
- Loading IBM IWA directly from the external tables using the TPCH data generation utility (jj_tpch_data_stripped.sh)
As the data generation is the same for both approaches, the first issue to address is removal of the last “*_comment” field from the data generation scripts to ensure effective use of available resources.
Removal of “*_comment” columns from data generation
The following script achieves this, using awk or sed to remove in-line the last “|” delimited field, placing all data on respective pipes for subsequent usage by external tables:
- Drop any previous incarnation of the “jj_tpch_internal” database
- Create the jj_tpch_internal database with buffered logging in an appropriate dbspace, followed by creation of raw TPCH tables excluding the “*_comment” columns:
- Create the supporting external tables which will subsequently be used to load the raw tables (the working directory is “jj_tpch_external” where all scripts and external tables are):
- Load the raw tables from the external tables (note that the data generation should already be running; it can be started in the background before the SQL scripted is run):
- Alter the raw tables to standard once loading has completed:
- Create unique indexes against the candidate primary keys:
- Run a full set of update statistics
- Drop any previous incarnation of the “jj_tpch_mart_int” Data Mart from the Accelerator “JJDWA1” and the current database
- Clean up any previous probing data generated by “set environment use_dwa ‘probe start’;”
- Start the SmartMart probing utility to “probe” any queries that are run in this session, gathering information for later processing into a mart definition
- Generally the requirement of running the query to produce data is not required or desired during query probing, so setting “set explain on avoid_execute;” prevents query execution
- Having previously established the two FACT tables, lineitem and partsupp, for the Data Mart, the following 4 statements dictate:
- That lineitem is a FACT table for the first select statement, with the select statement detailing all potential joins between lineitem and the DIMENSION tables
- That partsupp is a FACT table for the second select statement, with the select statement detailing all potential joins between partsupp and the DIMENSION tables
- Turn off set explain and probing now that the candidate queries have been processed:
- At this point a separate logged database is required for processing the captured probe data and converting into an actual Mart definition; here “jj_tpch_internal_mart” is dropped and created for use.
- The following statement processes probing data that was captured against the “jj_tpch_internal” database and creates a Data Mart definition “jj_tpch_mart_int”
- The Data Mart definition jj_tpch_mart_int residing within jj_tpch_internal_mart now needs to be passed to the Accelerator JJDWA1 whilst connected to the database where the source data is held
- All that remains, which is really the main activity being performed here, is to load the Data Mart jj_tpch_mart_int in the Accelerator JJDWA1 with NONE for the locking preferences
- For completeness, the Data Mart definition jj_tpch_mart_int can be extracted from the Accelerator JJDWA1 and placed in a file jj_tpch_mart_int_??.xml, where ?? is replaced. This file is in XML format
- Drop any previous incarnation of the “jj_tpch_external” database:
- Create the “jj_tpch_external” database with buffered logging in an appropriate dbspace:
- For ease of use, create a set of standard tables (named tabname_std) which reflect the format of the external tables; these will subsequently be used to create a corresponding set of external tables using the SAMEAS clause.
- Create the required external tables based on the tabname_std table with the corresponding pipe; these will be the source of data used to load directly into the data mart:
- The creation of the data mart requires knowledge of unique keys, and creation of unique disabled indexes against the external tables denotes this:
- Drop any previous incarnation of the “jj_tpch_mart_ext” Data Mart from the Accelerator “JJDWA1” and the current database
- Clean up any previous probing data generated by “set environment use_dwa ‘probe start’;”
- Start the SmartMart probing utility to “probe” any queries that are run in this session, gathering information for later processing into a mart definition
- Generally the requirement of running the query to produce data is not required or desired during query probing, so setting “set explain on avoid_execute;” prevents query execution
- Having previously established the two FACT tables, lineitem and partsupp, for the Data Mart, the following 4 statements dictate:
- That lineitem is a FACT table for the first select statement, with the select statement detailing all potential joins between lineitem and the DIMENSION tables
- That partsupp is a FACT table for the second select statement, with the select statement detailing all potential joins between partsupp and the DIMENSION tables
- Turn off set explain and probing now that the candidate queries have been processed:
- At this point a separate logged database is required for processing the captured probe data and converting into an actual Mart definition; here “jj_tpch_internal_mart” is dropped and created for use.
- The following statement processes probing data that was captured against the “jj_tpch_external” database and creates a Data Mart definition “jj_tpch_mart_ext”
- The Data Mart definition jj_tpch_mart_ext residing within jj_tpch_external_mart now needs to be passed to the Accelerator JJDWA1 whilst connected to the database where the source data is held
- All that remains, which is really the main activity being performed here, is to load the Data Mart jj_tpch_mart_ext in the Accelerator JJDWA1 with NONE for the locking preferences
- For completeness, the Data Mart definition jj_tpch_mart_ext can be extracted from the Accelerator JJDWA1 and placed in a file jj_tpch_mart_ext_??.xml, where ?? is replaced. This file is in XML format
Conclusion
This article, and the attached files, demonstrate the complete process of creation and loading of an IBM Informix Warehouse Accelerator, using both internal table data (standard tables) and external tables ( based on pipes) as the source of data. The process is straight forward and the provided examples provide a clear initial basis to build upon. Using external tables, any data source can potentially be transformed into a data mart within IWA enabling high performance “what if” analysis in a discrete environment.
Resources
Disclaimer
The code fix suggested above is provided “as is” without warranty of any kind, either express or implied, including without limitation any implied warranties of condition, uninterrupted use, merchantability, fitness for a particular purpose, or non-infringement.
Contact us
If you have any questions regarding data marts and IBM Informix Warehouse Accelerator and would like to find out more, simply contact us.