Easy IWA: Part 1 – The TPCH database, data generation and IBM Informix 12.10.FC4

[fusion_builder_container hundred_percent="no" equal_height_columns="no" menu_anchor="" hide_on_mobile="small-visibility,medium-visibility,large-visibility" class="" id="" background_color="" background_image="" background_position="center center" background_repeat="no-repeat" fade="no" background_parallax="none" parallax_speed="0.3" video_mp4="" video_webm="" video_ogv="" video_url="" video_aspect_ratio="16:9" video_loop="yes" video_mute="yes" overlay_color="" video_preview_image="" border_color="" border_style="solid" padding_top="" padding_bottom="" padding_left="" padding_right="" type="legacy"][fusion_builder_row][fusion_builder_column type="1_1" layout="1_1" background_position="left top" background_color="" border_color="" border_style="solid" border_position="all" spacing="yes" background_image="" background_repeat="no-repeat" padding_top="" padding_right="" padding_bottom="" padding_left="" margin_top="0px" margin_bottom="0px" class="" id="" animation_type="" animation_speed="0.3" animation_direction="left" hide_on_mobile="small-visibility,medium-visibility,large-visibility" center_content="no" last="true" min_height="" hover_type="none" link="" border_sizes_top="" border_sizes_bottom="" border_sizes_left="" border_sizes_right="" first="true" type="1_1"][fusion_text columns="" column_min_width="" column_spacing="" rule_style="default" rule_size="" rule_color="" hide_on_mobile="small-visibility,medium-visibility,large-visibility" class="" id="" animation_type="" animation_direction="left" animation_speed="0.3" animation_offset=""] Abstract This first article in the Easy IWA series details the regular requirement for a referentially complete database schema along with the capability to generate and load data during testing and POC exercises. This article provides information on implementing the TPCH database schema in IBM Informix, the dbgen data generation utility and loading the generated data. Content The TPCH database and dbgen data generation utility, courtesy of http://www.tpc.org , were developed to provide an approach to benchmarking and include: The tpch Database structure A referentially complete database schema The tpch dbgen utility A utility to populate the database with a specified amount of data (Scale Factor – SF) The tpch benchmark queries – not detailed here A set of pre-defined data warehouse queries to run against the database This article details the creation of the tpch database and population using the dbgen utility to generate data; data population is detailed using flat files generated by dbgen and also pipes. [/fusion_text][fusion_accordion type="" boxed_mode="" border_size="1" border_color="" background_color="" hover_color="" divider_line="" title_font_size="" icon_size="" icon_color="" icon_boxed_mode="" icon_box_color="" icon_alignment="" toggle_hover_accent_color="" hide_on_mobile="small-visibility,medium-visibility,large-visibility" class="" id=""][fusion_toggle title="The tpch Database structure" open="no" class="" id=""] [br][br]In essence, the schema consists of 8 tables, 8 explicit unique indexes supporting 8 primary keys and 9 explicit indexes supporting 9 foreign keys. [/fusion_toggle][fusion_toggle title="The tpch dbgen utility" open="no" class="" id=""]The tpch dbgen utility generates, by default, a set of flat files suitable for loading into the tpch schema with the size based on the “Scale Factor” argument; a scale factor of 1 produces a complete data set of approximately 1 GB, a scale factor of 10 produces a data set of approximately 10 GB etc.[br][br] Download the following zip file http://www.tpc.org/tpch/spec/tpch_2_17_0.zip to a temporary directory and unzip.[br][br] Go to the extracted tpch_2_17_0/dbgen directory and copy makefile.suite to Makefile; within the Makefile amend the following to suit your environment:[br][br] [fusion_syntax_highlighter theme="" language="x-sh" line_numbers="" line_wrapping="" copy_to_clipboard="" copy_to_clipboard_text="" hide_on_mobile="small-visibility,medium-visibility,large-visibility" class="" id="" font_size="" border_size="" border_color="" border_style="" background_color="" line_number_background_color="" line_number_text_color="" margin_top="" margin_right="" margin_bottom="" margin_left=""]CC=gcc DATABASE=INFORMIX MACHINE=LINUX WORKLOAD=TPCH[/fusion_syntax_highlighter][br] Then run make ensuring a clean compilation![br][br] After making sure that there is adequate filesystem disk space available (i.e. more than 1 GB!), run ./dbgen –s 1 [br][br] If the following files are produced, then dbgen has been successfully built:[br][br] [fusion_syntax_highlighter theme="" language="x-sh" line_numbers="" line_wrapping="" copy_to_clipboard="" copy_to_clipboard_text="" hide_on_mobile="small-visibility,medium-visibility,large-visibility" class="" id="" font_size="" border_size="" border_color="" border_style="" background_color="" line_number_background_color="" line_number_text_color="" margin_top="" margin_right="" margin_bottom="" margin_left=""]-rw-r--r-- 1 informix informix 1409184 Aug 5 16:07 supplier.tbl -rw-r--r-- 1 informix informix 389 Aug 5 16:07 region.tbl -rw-r--r-- 1 informix informix 24207240 Aug 5 16:07 part.tbl -rw-r--r-- 1 informix informix 118984616 Aug 5 16:07 partsupp.tbl -rw-r--r-- 1 informix informix 171952161 Aug 5 16:07 orders.tbl -rw-r--r-- 1 informix informix 2224 Aug 5 16:07 nation.tbl -rw-r--r-- 1 informix informix 759863287 Aug 5 16:07 lineitem.tbl -rw-r--r-- 1 informix informix 24346144 Aug 5 16:07 customer.tbl[/fusion_syntax_highlighter][br] For completeness and readability, perform the following: Remove the just generated .tbl files under ../tpch_2_17_0/dbgen Create a new directory, /home/Informix/dbgen_article (for example) Copy ./tpch_2_17_0/dbgen/dists.dss to /home/Informix/dbgen_article/ Copy ./tpch_2_17_0/dbgen/dbgen to /home/Informix/dbgen_article/ The dbgen utility can be run with various options, some examples are detailed below: ./dbgen --- Show complete usage ./dbgen –s 1 –f Force overwrite of existing files ./dbgen –s 1 –T c Generate just the customers (there are options for each table) The following helper script generates all table data in parallel as flat files:[br][br] [fusion_syntax_highlighter theme="" language="x-sh" line_numbers="" line_wrapping="" copy_to_clipboard="" copy_to_clipboard_text="" hide_on_mobile="small-visibility,medium-visibility,large-visibility" class="" id="" font_size="" border_size="" border_color="" border_style="" background_color="" line_number_background_color="" line_number_text_color="" margin_top="" margin_right="" margin_bottom="" margin_left=""]> cat jj_dbgen_data.sh SF=1 for entity in lineitem customer orders partsupp part supplier region nation do rm $entity.tbl done ./dbgen -v -T c -s $SF & ./dbgen -v -T L -s $SF & ./dbgen -v -T n -s $SF & ./dbgen -v -T O -s $SF & ./dbgen -v -T P -s $SF & ./dbgen -v -T r -s $SF & ./dbgen -v -T s -s $SF & ./dbgen -v -T S -s $SF & wait[/fusion_syntax_highlighter][br] The generated data can also be placed, in parallel, on pipes with a slight amendment to the above script:[br][br] [fusion_syntax_highlighter theme="" language="x-sh" line_numbers="" line_wrapping="" copy_to_clipboard="" copy_to_clipboard_text="" hide_on_mobile="small-visibility,medium-visibility,large-visibility" class="" id="" font_size="" border_size="" border_color="" border_style="" background_color="" line_number_background_color="" line_number_text_color="" margin_top="" margin_right="" margin_bottom="" margin_left=""]> cat jj_dbgen_data.sh SF=1 for entity in lineitem customer orders partsupp part supplier region nation do rm $entity.tbl mknod $entity.tbl p done ./dbgen -v -T c -s $SF & ./dbgen -v -T L -s $SF & ./dbgen -v -T n -s $SF & ./dbgen -v -T O -s $SF & ./dbgen -v -T P -s $SF & ./dbgen -v -T r -s $SF & ./dbgen -v -T s -s $SF & ./dbgen -v -T S -s $SF & wait[/fusion_syntax_highlighter][br] However, the data generation will not proceed until each pipe has started to be read; the following helper script can be used for flushing all data through the pipes:[br][br] [fusion_syntax_highlighter theme="" language="x-sh" line_numbers="" line_wrapping="" copy_to_clipboard="" copy_to_clipboard_text="" hide_on_mobile="small-visibility,medium-visibility,large-visibility" class="" id="" font_size="" border_size="" border_color="" border_style="" background_color="" line_number_background_color="" line_number_text_color="" margin_top="" margin_right="" margin_bottom="" margin_left=""]> cat jj_flush_pipes.sh for entity in lineitem customer orders partsupp part supplier region nation do cat $entity.tbl > /dev/null & done wait[/fusion_syntax_highlighter][br] Hint – wait until 100% displayed for each dbgen execution before executing this script.[br][br] With the above information, there are two approaches that can be followed to load data; one is loading the data from flat files and the second is loading the data from pipes.[/fusion_toggle][fusion_toggle title="Loading the data from flat file" open="no" class="" id=""] Using the region table as an example, the following SQL details: Creation of the database Creation of the region table as raw Creation of an external “disk” table region_ext “sameas” region Insertion of data into the region table Altering the region table to standard The addition of a unique index and the primary key to the region table Before running this SQL, the file /home/informix/dbgen_article/region.tbl should be created using ./dbgen -v -T r –s 1 : [fusion_syntax_highlighter theme="" language="x-sh" line_numbers="" line_wrapping="" copy_to_clipboard="" copy_to_clipboard_text="" hide_on_mobile="small-visibility,medium-visibility,large-visibility" class="" id="" font_size="" border_size="" border_color="" border_style="" background_color="" line_number_background_color="" line_number_text_color="" margin_top="" margin_right="" margin_bottom="" margin_left=""]Y3JlYXRlIGRhdGFiYXNlIGpqX2RiZ2VuIGluIGRhdGFkYnNfMDEgd2l0aCBidWZmZXJlZCBsb2c7CmNyZWF0ZSByYXcgdGFibGUgImluZm9ybWl4Ii5yZWdpb24gKApyX3JlZ2lvbmtleSBpbnRlZ2VyIG5vdCBudWxsLApyX25hbWUgY2hhcigyNSkgbm90IG51bGwsCnJfY29tbWVudCB2YXJjaGFyKDE1MikgKSBsb2NrIG1vZGUgcm93OwpjcmVhdGUgZXh0ZXJuYWwgdGFibGUgImluZm9ybWl4Ii5yZWdpb25fZXh0IHNhbWVhcyByZWdpb24KdXNpbmcgKCBkYXRhZmlsZXMoImRpc2s6L2hvbWUvaW5mb3JtaXgvZGJnZW5fYXJ0aWNsZS9yZWdpb24udGJsIikpOwppbnNlcnQgaW50byByZWdpb24gc2VsZWN0ICogZnJvbSByZWdpb25fZXh0OwphbHRlciB0YWJsZSByZWdpb24gdHlwZSAoc3RhbmRhcmQpOwpjcmVhdGUgdW5pcXVlIGluZGV4IHJlZ2lvbl9wayBvbiByZWdpb24gKHJfcmVnaW9ua2V5KTsKYWx0ZXIgdGFibGUgcmVnaW9uIGFkZCBjb25zdHJhaW50IChwcmltYXJ5IGtleSAocl9yZWdpb25rZXkpKTs=[/fusion_syntax_highlighter] Hint – the DBDATE format is YMD4-[br] Hint – the database, jj_dbgen, cannot be dropped until a Level 0 archive is performed[br] A fake backup can be run using onbar –b –F[br] Alternatively, the individual tables can be dropped without a Level 0 archive [/fusion_toggle][fusion_toggle title="Loading the data from a pipe" open="no" class="" id=""] The only differences to load from a pipe as opposed to disk are: Remove the flat file “region.tbl” rm region.tbl Create the “region.tbl” as a pipe mknod region.tbl p Amend the external table definition for region_ext as a “pipe” create external table "informix".region_ext sameas region using ( datafiles("pipe:/home/informix/dbgen_article/region.tbl")); Prime the region pipe with data – note this will remain running ./dbgen –v –T r –s 1 Then run the preceding short SQL Hint – do not attempt to read from an external table based on pipes – IT WILL HANG - unless there is something to read. If a read is attempted then a simple echo “A” > tablename.tbl can be run and the SQL will complete. [/fusion_toggle][fusion_toggle title="Complete SQL" open="no" class="" id=""]Below is the complete SQL which is able to be run using dbaccess – jj_dbgen.sql after running jj_dbgen.sh to generate the data as flat files.[br][br] Modify the database creation statement to denote an appropriate dbspace.[br][br] In order to load the data from pipes, change the external table definitions from “disk” to “pipe” and modify the jj_dbgen_data.sh script to generate the data on pipes and run in the background ./jj_dbgen_data.sh & , then run dbaccess – jj_dbgen.sql .[br][br] It should be noted that the 12.10.xC4 feature of “NOVALIDATE” when creating foreign keys is being used; remove the “NOVALIDATE” if working with versions prior to 12.10.FC4[br][br] [fusion_syntax_highlighter theme="" language="sql" line_numbers="" line_wrapping="" copy_to_clipboard="" copy_to_clipboard_text="" hide_on_mobile="small-visibility,medium-visibility,large-visibility" class="" id="" font_size="" border_size="" border_color="" border_style="" background_color="" line_number_background_color="" line_number_text_color="" margin_top="" margin_right="" margin_bottom="" margin_left=""]ZGF0YWJhc2UgampfZGJnZW47CmRyb3AgdGFibGUgbmF0aW9uOwpkcm9wIHRhYmxlIHJlZ2lvbjsKZHJvcCB0YWJsZSBwYXJ0Owpkcm9wIHRhYmxlIHN1cHBsaWVyOwpkcm9wIHRhYmxlIHBhcnRzdXBwOwpkcm9wIHRhYmxlIGN1c3RvbWVyOwpkcm9wIHRhYmxlIG9yZGVyczsKZHJvcCB0YWJsZSBsaW5laXRlbTsKY2xvc2UgZGF0YWJhc2U7Cgpkcm9wIGRhdGFiYXNlIGpqX2RiZ2VuOwoKY3JlYXRlIGRhdGFiYXNlIGpqX2RiZ2VuIGluIGRhdGFkYnNfMDEgd2l0aCBidWZmZXJlZCBsb2c7CnNlbGVjdCBjdXJyZW50IGZyb20gc3lzdGFibGVzIHdoZXJlIHRhYmlkID0gMTsKCmNyZWF0ZSByYXcgdGFibGUgImluZm9ybWl4Ii5uYXRpb24gKApuX25hdGlvbmtleSBpbnRlZ2VyIG5vdCBudWxsICwKbl9uYW1lIGNoYXIoMjUpIG5vdCBudWxsICwKbl9yZWdpb25rZXkgaW50ZWdlciBub3QgbnVsbCAsCm5fY29tbWVudCB2YXJjaGFyKDE1MikgKSBsb2NrIG1vZGUgcm93OwoKY3JlYXRlIHJhdyB0YWJsZSAiaW5mb3JtaXgiLnJlZ2lvbiAoCnJfcmVnaW9ua2V5IGludGVnZXIgbm90IG51bGwgLApyX25hbWUgY2hhcigyNSkgbm90IG51bGwgLApyX2NvbW1lbnQgdmFyY2hhcigxNTIpICkgbG9jayBtb2RlIHJvdzsKCmNyZWF0ZSByYXcgdGFibGUgImluZm9ybWl4Ii5wYXJ0ICgKcF9wYXJ0a2V5IGludGVnZXIgbm90IG51bGwgLApwX25hbWUgdmFyY2hhcig1NSkgbm90IG51bGwgLApwX21mZ3IgY2hhcigyNSkgbm90IG51bGwgLApwX2JyYW5kIGNoYXIoMTApIG5vdCBudWxsICwKcF90eXBlIHZhcmNoYXIoMjUpIG5vdCBudWxsICwKcF9zaXplIGludGVnZXIgbm90IG51bGwgLApwX2NvbnRhaW5lciBjaGFyKDEwKSBub3QgbnVsbCAsCnBfcmV0YWlscHJpY2UgZGVjaW1hbCgxNSwyKSBub3QgbnVsbCAsCnBfY29tbWVudCB2YXJjaGFyKDIzKSBub3QgbnVsbCApIGxvY2sgbW9kZSByb3c7CgpjcmVhdGUgcmF3IHRhYmxlICJpbmZvcm1peCIuc3VwcGxpZXIgKApzX3N1cHBrZXkgaW50ZWdlciBub3QgbnVsbCAsCnNfbmFtZSBjaGFyKDI1KSBub3QgbnVsbCAsCnNfYWRkcmVzcyB2YXJjaGFyKDQwKSBub3QgbnVsbCAsCnNfbmF0aW9ua2V5IGludGVnZXIgbm90IG51bGwgLApzX3Bob25lIGNoYXIoMTUpIG5vdCBudWxsICwKc19hY2N0YmFsIGRlY2ltYWwoMTUsMikgbm90IG51bGwgLApzX2NvbW1lbnQgdmFyY2hhcigxMDEpIG5vdCBudWxsICkgbG9jayBtb2RlIHJvdzsKCmNyZWF0ZSByYXcgdGFibGUgImluZm9ybWl4Ii5wYXJ0c3VwcCAoCnBzX3BhcnRrZXkgaW50ZWdlciBub3QgbnVsbCAsCnBzX3N1cHBrZXkgaW50ZWdlciBub3QgbnVsbCAsCnBzX2F2YWlscXR5IGludGVnZXIgbm90IG51bGwgLApwc19zdXBwbHljb3N0IGRlY2ltYWwoMTUsMikgbm90IG51bGwgLApwc19jb21tZW50IHZhcmNoYXIoMTk5KSBub3QgbnVsbCApIGxvY2sgbW9kZSByb3c7CmNyZWF0ZSByYXcgdGFibGUgImluZm9ybWl4Ii5jdXN0b21lciAoCmNfY3VzdGtleSBpbnRlZ2VyIG5vdCBudWxsICwKY19uYW1lIHZhcmNoYXIoMjUpIG5vdCBudWxsICwKY19hZGRyZXNzIHZhcmNoYXIoNDApIG5vdCBudWxsICwKY19uYXRpb25rZXkgaW50ZWdlciBub3QgbnVsbCAsCmNfcGhvbmUgY2hhcigxNSkgbm90IG51bGwgLApjX2FjY3RiYWwgZGVjaW1hbCgxNSwyKSBub3QgbnVsbCAsCmNfbWt0c2VnbWVudCBjaGFyKDEwKSBub3QgbnVsbCAsCmNfY29tbWVudCB2YXJjaGFyKDExNykgbm90IG51bGwgKSBsb2NrIG1vZGUgcm93OwoKY3JlYXRlIHJhdyB0YWJsZSAiaW5mb3JtaXgiLm9yZGVycyAoCm9fb3JkZXJrZXkgaW50ZWdlciBub3QgbnVsbCAsCm9fY3VzdGtleSBpbnRlZ2VyIG5vdCBudWxsICwKb19vcmRlcnN0YXR1cyBjaGFyKDEpIG5vdCBudWxsICwKb190b3RhbHByaWNlIGRlY2ltYWwoMTUsMikgbm90IG51bGwgLApvX29yZGVyZGF0ZSBkYXRlIG5vdCBudWxsICwKb19vcmRlcnByaW9yaXR5IGNoYXIoMTUpIG5vdCBudWxsICwKb19jbGVyayBjaGFyKDE1KSBub3QgbnVsbCAsCm9fc2hpcHByaW9yaXR5IGludGVnZXIgbm90IG51bGwgLApvX2NvbW1lbnQgdmFyY2hhcig3OSkgbm90IG51bGwgKSBsb2NrIG1vZGUgcm93OwoKY3JlYXRlIHJhdyB0YWJsZSAiaW5mb3JtaXgiLmxpbmVpdGVtICgKbF9vcmRlcmtleSBpbnRlZ2VyIG5vdCBudWxsICwKbF9wYXJ0a2V5IGludGVnZXIgbm90IG51bGwgLApsX3N1cHBrZXkgaW50ZWdlciBub3QgbnVsbCAsCmxfbGluZW51bWJlciBpbnRlZ2VyIG5vdCBudWxsICwKbF9xdWFudGl0eSBkZWNpbWFsKDE1LDIpIG5vdCBudWxsICwKbF9leHRlbmRlZHByaWNlIGRlY2ltYWwoMTUsMikgbm90IG51bGwgLApsX2Rpc2NvdW50IGRlY2ltYWwoMTUsMikgbm90IG51bGwgLApsX3RheCBkZWNpbWFsKDE1LDIpIG5vdCBudWxsICwKbF9yZXR1cm5mbGFnIGNoYXIoMSkgbm90IG51bGwgLApsX2xpbmVzdGF0dXMgY2hhcigxKSBub3QgbnVsbCAsCmxfc2hpcGRhdGUgZGF0ZSBub3QgbnVsbCAsCmxfY29tbWl0ZGF0ZSBkYXRlIG5vdCBudWxsICwKbF9yZWNlaXB0ZGF0ZSBkYXRlIG5vdCBudWxsICwKbF9zaGlwaW5zdHJ1Y3QgY2hhcigyNSkgbm90IG51bGwgLApsX3NoaXBtb2RlIGNoYXIoMTApIG5vdCBudWxsICwKbF9jb21tZW50IHZhcmNoYXIoNDQpIG5vdCBudWxsICkgbG9jayBtb2RlIHJvdzsKCmNyZWF0ZSBleHRlcm5hbCB0YWJsZSAiaW5mb3JtaXgiLm5hdGlvbl9leHQgc2FtZWFzIG5hdGlvbiAKdXNpbmcgKCBkYXRhZmlsZXMoImRpc2s6L2hvbWUvaW5mb3JtaXgvZGJnZW5fYXJ0aWNsZS9uYXRpb24udGJsIikpOwpjcmVhdGUgZXh0ZXJuYWwgdGFibGUgImluZm9ybWl4Ii5yZWdpb25fZXh0IHNhbWVhcyByZWdpb24gCnVzaW5nICggZGF0YWZpbGVzKCJkaXNrOi9ob21lL2luZm9ybWl4L2RiZ2VuX2FydGljbGUvcmVnaW9uLnRibCIpKTsKY3JlYXRlIGV4dGVybmFsIHRhYmxlICJpbmZvcm1peCIucGFydF9leHQgc2FtZWFzIHBhcnQgCnVzaW5nICggZGF0YWZpbGVzKCJkaXNrOi9ob21lL2luZm9ybWl4L2RiZ2VuX2FydGljbGUvcGFydC50YmwiKSk7CmNyZWF0ZSBleHRlcm5hbCB0YWJsZSAiaW5mb3JtaXgiLnN1cHBsaWVyX2V4dCBzYW1lYXMgc3VwcGxpZXIgCnVzaW5nICggZGF0YWZpbGVzKCJkaXNrOi9ob21lL2luZm9ybWl4L2RiZ2VuX2FydGljbGUvc3VwcGxpZXIudGJsIikpOwpjcmVhdGUgZXh0ZXJuYWwgdGFibGUgImluZm9ybWl4Ii5wYXJ0c3VwcF9leHQgc2FtZWFzIHBhcnRzdXBwIAp1c2luZyAoIGRhdGFmaWxlcygiZGlzazovaG9tZS9pbmZvcm1peC9kYmdlbl9hcnRpY2xlL3BhcnRzdXBwLnRibCIpKTsKY3JlYXRlIGV4dGVybmFsIHRhYmxlICJpbmZvcm1peCIuY3VzdG9tZXJfZXh0IHNhbWVhcyBjdXN0b21lciAKdXNpbmcgKCBkYXRhZmlsZXMoImRpc2s6L2hvbWUvaW5mb3JtaXgvZGJnZW5fYXJ0aWNsZS9jdXN0b21lci50YmwiKSk7CmNyZWF0ZSBleHRlcm5hbCB0YWJsZSAiaW5mb3JtaXgiLm9yZGVyc19leHQgc2FtZWFzIG9yZGVycyAKdXNpbmcgKCBkYXRhZmlsZXMoImRpc2s6L2hvbWUvaW5mb3JtaXgvZGJnZW5fYXJ0aWNsZS9vcmRlcnMudGJsIikpOwpjcmVhdGUgZXh0ZXJuYWwgdGFibGUgImluZm9ybWl4Ii5saW5laXRlbV9leHQgc2FtZWFzIGxpbmVpdGVtIAp1c2luZyAoIGRhdGFmaWxlcygiZGlzazovaG9tZS9pbmZvcm1peC9kYmdlbl9hcnRpY2xlL2xpbmVpdGVtLnRibCIpKTsKCmluc2VydCBpbnRvIG5hdGlvbiBzZWxlY3QgKiBmcm9tIG5hdGlvbl9leHQ7Cmluc2VydCBpbnRvIHJlZ2lvbiBzZWxlY3QgKiBmcm9tIHJlZ2lvbl9leHQ7Cmluc2VydCBpbnRvIHN1cHBsaWVyIHNlbGVjdCAqIGZyb20gc3VwcGxpZXJfZXh0OwppbnNlcnQgaW50byBjdXN0b21lciBzZWxlY3QgKiBmcm9tIGN1c3RvbWVyX2V4dDsKaW5zZXJ0IGludG8gcGFydCBzZWxlY3QgKiBmcm9tIHBhcnRfZXh0OwppbnNlcnQgaW50byBwYXJ0c3VwcCBzZWxlY3QgKiBmcm9tIHBhcnRzdXBwX2V4dDsKaW5zZXJ0IGludG8gbGluZWl0ZW0gc2VsZWN0ICogZnJvbSBsaW5laXRlbV9leHQ7Cmluc2VydCBpbnRvIG9yZGVycyBzZWxlY3QgKiBmcm9tIG9yZGVyc19leHQ7CgphbHRlciB0YWJsZSBuYXRpb24gdHlwZSAoc3RhbmRhcmQpOwphbHRlciB0YWJsZSByZWdpb24gdHlwZSAoc3RhbmRhcmQpOwphbHRlciB0YWJsZSBzdXBwbGllciB0eXBlIChzdGFuZGFyZCk7CmFsdGVyIHRhYmxlIGN1c3RvbWVyIHR5cGUgKHN0YW5kYXJkKTsKYWx0ZXIgdGFibGUgcGFydCB0eXBlIChzdGFuZGFyZCk7CmFsdGVyIHRhYmxlIHBhcnRzdXBwIHR5cGUgKHN0YW5kYXJkKTsKYWx0ZXIgdGFibGUgb3JkZXJzIHR5cGUgKHN0YW5kYXJkKTsKYWx0ZXIgdGFibGUgbGluZWl0ZW0gdHlwZSAoc3RhbmRhcmQpOwoKY3JlYXRlIHVuaXF1ZSBpbmRleCByZWdpb25fcGsgb24gcmVnaW9uIChyX3JlZ2lvbmtleSk7CmNyZWF0ZSB1bmlxdWUgaW5kZXggbmF0aW9uX3BrIG9uIG5hdGlvbiAobl9uYXRpb25rZXkpOwpjcmVhdGUgdW5pcXVlIGluZGV4IHN1cHBsaWVyX3BrIG9uIHN1cHBsaWVyIChzX3N1cHBrZXkpOwpjcmVhdGUgdW5pcXVlIGluZGV4IGN1c3RvbWVyX3BrIG9uIGN1c3RvbWVyIChjX2N1c3RrZXkpOwpjcmVhdGUgdW5pcXVlIGluZGV4IHBhcnRfcGsgb24gcGFydCAocF9wYXJ0a2V5KTsKY3JlYXRlIHVuaXF1ZSBpbmRleCBwYXJ0c3VwcF9wayBvbiBwYXJ0c3VwcCAocHNfcGFydGtleSwgcHNfc3VwcGtleSk7CmNyZWF0ZSB1bmlxdWUgaW5kZXggbGluZWl0ZW1fcGsgb24gbGluZWl0ZW0gKGxfb3JkZXJrZXksIGxfbGluZW51bWJlcik7CmNyZWF0ZSB1bmlxdWUgaW5kZXggb3JkZXJzX3BrIG9uIG9yZGVycyAob19vcmRlcmtleSk7CgpjcmVhdGUgaW5kZXggbmF0aW9uX2ZrX3JlZ2lvbiBvbiBuYXRpb24gKG5fcmVnaW9ua2V5KTsKY3JlYXRlIGluZGV4IHN1cHBsaWVyX2ZrX25hdGlvbiBvbiBzdXBwbGllciAoc19uYXRpb25rZXkpOwpjcmVhdGUgaW5kZXggcGFydHN1cHBfZmtfcGFydCBvbiBwYXJ0c3VwcCAocHNfcGFydGtleSk7CmNyZWF0ZSBpbmRleCBwYXJ0c3VwcF9ma19zdXBwbGllciBvbiBwYXJ0c3VwcCAocHNfc3VwcGtleSk7CmNyZWF0ZSBpbmRleCBjdXN0b21lcl9ma19uYXRpb24gb24gY3VzdG9tZXIgKGNfbmF0aW9ua2V5KTsKY3JlYXRlIGluZGV4IG9yZGVyc19ma19jdXN0b21lciBvbiBvcmRlcnMgKG9fY3VzdGtleSk7CmNyZWF0ZSBpbmRleCBsaW5laXRlbV9ma19vcmRlcnMgb24gbGluZWl0ZW0gKGxfb3JkZXJrZXkpOwpjcmVhdGUgaW5kZXggbGluZWl0ZW1fZmtfcGFydCBvbiBsaW5laXRlbSAobF9wYXJ0a2V5KTsKY3JlYXRlIGluZGV4IGxpbmVpdGVtX2ZrX3N1cHBsaWVyIG9uIGxpbmVpdGVtIChsX3N1cHBrZXkpOwoKYWx0ZXIgdGFibGUgcmVnaW9uIGFkZCBjb25zdHJhaW50IChwcmltYXJ5IGtleSAocl9yZWdpb25rZXkpKTsKYWx0ZXIgdGFibGUgbmF0aW9uIGFkZCBjb25zdHJhaW50IChwcmltYXJ5IGtleSAobl9uYXRpb25rZXkpKTsKYWx0ZXIgdGFibGUgc3VwcGxpZXIgYWRkIGNvbnN0cmFpbnQgKHByaW1hcnkga2V5IChzX3N1cHBrZXkpKTsKYWx0ZXIgdGFibGUgY3VzdG9tZXIgYWRkIGNvbnN0cmFpbnQgKHByaW1hcnkga2V5IChjX2N1c3RrZXkpKTsKYWx0ZXIgdGFibGUgcGFydCBhZGQgY29uc3RyYWludCAocHJpbWFyeSBrZXkgKHBfcGFydGtleSkpOwphbHRlciB0YWJsZSBwYXJ0c3VwcCBhZGQgY29uc3RyYWludCAocHJpbWFyeSBrZXkgKHBzX3BhcnRrZXksIHBzX3N1cHBrZXkpKTsKYWx0ZXIgdGFibGUgbGluZWl0ZW0gYWRkIGNvbnN0cmFpbnQgKHByaW1hcnkga2V5IChsX29yZGVya2V5LCBsX2xpbmVudW1iZXIpKTsKYWx0ZXIgdGFibGUgb3JkZXJzIGFkZCBjb25zdHJhaW50IChwcmltYXJ5IGtleSAob19vcmRlcmtleSkpOwoKYWx0ZXIgdGFibGUgbmF0aW9uIGFkZCBjb25zdHJhaW50IChmb3JlaWduIGtleSAobl9yZWdpb25rZXkpIHJlZmVyZW5jZXMgcmVnaW9uIG5vdmFsaWRhdGUpOwphbHRlciB0YWJsZSBzdXBwbGllciBhZGQgY29uc3RyYWludCAoZm9yZWlnbiBrZXkgKHNfbmF0aW9ua2V5KSByZWZlcmVuY2VzIG5hdGlvbiBub3ZhbGlkYXRlKTsKYWx0ZXIgdGFibGUgcGFydHN1cHAgYWRkIGNvbnN0cmFpbnQgKGZvcmVpZ24ga2V5IChwc19wYXJ0a2V5KSByZWZlcmVuY2VzIHBhcnQgbm92YWxpZGF0ZSk7CmFsdGVyIHRhYmxlIHBhcnRzdXBwIGFkZCBjb25zdHJhaW50IChmb3JlaWduIGtleSAocHNfc3VwcGtleSkgcmVmZXJlbmNlcyBzdXBwbGllciBub3ZhbGlkYXRlKTsKYWx0ZXIgdGFibGUgY3VzdG9tZXIgYWRkIGNvbnN0cmFpbnQgKGZvcmVpZ24ga2V5IChjX25hdGlvbmtleSkgcmVmZXJlbmNlcyBuYXRpb24gbm92YWxpZGF0ZSk7CmFsdGVyIHRhYmxlIG9yZGVycyBhZGQgY29uc3RyYWludCAoZm9yZWlnbiBrZXkgKG9fY3VzdGtleSkgcmVmZXJlbmNlcyBjdXN0b21lciBub3ZhbGlkYXRlKTsKYWx0ZXIgdGFibGUgbGluZWl0ZW0gYWRkIGNvbnN0cmFpbnQgKGZvcmVpZ24ga2V5IChsX29yZGVya2V5KSByZWZlcmVuY2VzIG9yZGVycyBub3ZhbGlkYXRlKTsKYWx0ZXIgdGFibGUgbGluZWl0ZW0gYWRkIGNvbnN0cmFpbnQgKGZvcmVpZ24ga2V5IChsX3BhcnRrZXkpIHJlZmVyZW5jZXMgcGFydCBub3ZhbGlkYXRlKTsKYWx0ZXIgdGFibGUgbGluZWl0ZW0gYWRkIGNvbnN0cmFpbnQgKGZvcmVpZ24ga2V5IChsX3N1cHBrZXkpIHJlZmVyZW5jZXMgc3VwcGxpZXIgbm92YWxpZGF0ZSk7CgpzZWxlY3QgY3VycmVudCBmcm9tIHN5c3RhYmxlcyB3aGVyZSB0YWJpZCA9IDE7[/fusion_syntax_highlighter][/fusion_toggle][/fusion_accordion][fusion_text columns="" column_min_width="" column_spacing="" rule_style="default" rule_size="" rule_color="" hide_on_mobile="small-visibility,medium-visibility,large-visibility" class="" id="" animation_type="" animation_direction="left" animation_speed="0.3" animation_offset=""] Conclusion For testing and POC exercises, often what is required is a populated database of a specific size; this article provides enough information to implement the TPCH database with a data population of any size ranging from 1 GB to any size and demonstrates using IBM Informix external tables from “disk” or “pipe”. Disclaimer The code fix suggested above is provided "as is" without warranty of any kind, either express or implied, including without limitation any implied warranties of condition, uninterrupted use, merchantability, fitness for a particular purpose, or non-infringement. [/fusion_text][/fusion_builder_column][/fusion_builder_row][/fusion_builder_container]

The TPCH database - how to use it by Oninit Consulting

Abstract

This first article in the Easy IWA series details the regular requirement for a referentially complete database schema along with the capability to generate and load data during testing and POC exercises. This article provides information on implementing the TPCH database schema in IBM Informix, the dbgen data generation utility and loading the generated data.

Content

The TPCH database and dbgen data generation utility, courtesy of http://www.tpc.org, were developed to provide an approach to benchmarking and include:

The tpch Database structure
- A referentially complete database schema
The tpch dbgen utility
- A utility to populate the database with a specified amount of data (Scale Factor – SF)
The tpch benchmark queries – not detailed here
- A set of pre-defined data warehouse queries to run against the database

This article details the creation of the tpch database and population using the dbgen utility to generate data; data population is detailed using flat files generated by dbgen and also pipes.

The tpch Database structure

In essence, the schema consists of 8 tables, 8 explicit unique indexes supporting 8 primary keys and 9 explicit indexes supporting 9 foreign keys.

The tpch dbgen utility

The tpch dbgen utility generates, by default, a set of flat files suitable for loading into the tpch schema with the size based on the “Scale Factor” argument; a scale factor of 1 produces a complete data set of approximately 1 GB, a scale factor of 10 produces a data set of approximately 10 GB etc.

Download the following zip file http://www.tpc.org/tpch/spec/tpch_2_17_0.zip to a temporary directory and unzip.

Go to the extracted tpch_2_17_0/dbgen directory and copy makefile.suite to Makefile; within the Makefile amend the following to suit your environment:

Copy to Clipboard

Then run make ensuring a clean compilation!

After making sure that there is adequate filesystem disk space available (i.e. more than 1 GB!), run ./dbgen –s 1

If the following files are produced, then dbgen has been successfully built:

Copy to Clipboard

For completeness and readability, perform the following:

Remove the just generated .tbl files under ../tpch_2_17_0/dbgen
Create a new directory, /home/Informix/dbgen_article (for example)
Copy ./tpch_2_17_0/dbgen/dists.dss to /home/Informix/dbgen_article/
Copy ./tpch_2_17_0/dbgen/dbgen to /home/Informix/dbgen_article/

The dbgen utility can be run with various options, some examples are detailed below:

./dbgen —
- Show complete usage
./dbgen –s 1 –f
- Force overwrite of existing files
./dbgen –s 1 –T c
- Generate just the customers (there are options for each table)

The following helper script generates all table data in parallel as flat files:

Copy to Clipboard

The generated data can also be placed, in parallel, on pipes with a slight amendment to the above script:

Copy to Clipboard

However, the data generation will not proceed until each pipe has started to be read; the following helper script can be used for flushing all data through the pipes:

Copy to Clipboard

Hint – wait until 100% displayed for each dbgen execution before executing this script.

With the above information, there are two approaches that can be followed to load data; one is loading the data from flat files and the second is loading the data from pipes.

Loading the data from flat file

Using the region table as an example, the following SQL details:

Creation of the database
Creation of the region table as raw
Creation of an external “disk” table region_ext “sameas” region
Insertion of data into the region table
Altering the region table to standard
The addition of a unique index and the primary key to the region table

Before running this SQL, the file /home/informix/dbgen_article/region.tbl should be created using ./dbgen -v -T r –s 1:

Copy to Clipboard
Syntax Highlightercreate database jj_dbgen in datadbs_01 with buffered log;
create raw table "informix".region (
r_regionkey integer not null,
r_name char(25) not null,
r_comment varchar(152) ) lock mode row;
create external table "informix".region_ext sameas region
using ( datafiles("disk:/home/informix/dbgen_article/region.tbl"));
insert into region select * from region_ext;
alter table region type (standard);
create unique index region_pk on region (r_regionkey);
alter table region add constraint (primary key (r_regionkey));

Hint – the DBDATE format is YMD4-
Hint – the database, jj_dbgen, cannot be dropped until a Level 0 archive is performed
A fake backup can be run using onbar –b –F
Alternatively, the individual tables can be dropped without a Level 0 archive

Loading the data from a pipe

The only differences to load from a pipe as opposed to disk are:

Remove the flat file “region.tbl” rm region.tbl
Create the “region.tbl” as a pipe mknod region.tbl p
Amend the external table definition for region_ext as a “pipe” create external table “informix”.region_ext sameas region using ( datafiles(“pipe:/home/informix/dbgen_article/region.tbl”));
Prime the region pipe with data – note this will remain running ./dbgen –v –T r –s 1

Then run the preceding short SQL Hint – do not attempt to read from an external table based on pipes – IT WILL HANG – unless there is something to read. If a read is attempted then a simple echo “A” > tablename.tbl can be run and the SQL will complete.

Complete SQL

Below is the complete SQL which is able to be run using dbaccess – jj_dbgen.sql after running jj_dbgen.sh to generate the data as flat files.

Modify the database creation statement to denote an appropriate dbspace.

In order to load the data from pipes, change the external table definitions from “disk” to “pipe” and modify the jj_dbgen_data.sh script to generate the data on pipes and run in the background ./jj_dbgen_data.sh &, then run dbaccess – jj_dbgen.sql.

It should be noted that the 12.10.xC4 feature of “NOVALIDATE” when creating foreign keys is being used; remove the “NOVALIDATE” if working with versions prior to 12.10.FC4

Copy to Clipboard

database jj_dbgen;
drop table nation;
drop table region;
drop table part;
drop table supplier;
drop table partsupp;
drop table customer;
drop table orders;
drop table lineitem;
close database;

drop database jj_dbgen;

create database jj_dbgen in datadbs_01 with buffered log;
select current from systables where tabid = 1;

create raw table "informix".nation (
n_nationkey integer not null ,
n_name char(25) not null ,
n_regionkey integer not null ,
n_comment varchar(152) ) lock mode row;

create raw table "informix".region (
r_regionkey integer not null ,
r_name char(25) not null ,
r_comment varchar(152) ) lock mode row;

create raw table "informix".part (
p_partkey integer not null ,
p_name varchar(55) not null ,
p_mfgr char(25) not null ,
p_brand char(10) not null ,
p_type varchar(25) not null ,
p_size integer not null ,
p_container char(10) not null ,
p_retailprice decimal(15,2) not null ,
p_comment varchar(23) not null ) lock mode row;

create raw table "informix".supplier (
s_suppkey integer not null ,
s_name char(25) not null ,
s_address varchar(40) not null ,
s_nationkey integer not null ,
s_phone char(15) not null ,
s_acctbal decimal(15,2) not null ,
s_comment varchar(101) not null ) lock mode row;

create raw table "informix".partsupp (
ps_partkey integer not null ,
ps_suppkey integer not null ,
ps_availqty integer not null ,
ps_supplycost decimal(15,2) not null ,
ps_comment varchar(199) not null ) lock mode row;
create raw table "informix".customer (
c_custkey integer not null ,
c_name varchar(25) not null ,
c_address varchar(40) not null ,
c_nationkey integer not null ,
c_phone char(15) not null ,
c_acctbal decimal(15,2) not null ,
c_mktsegment char(10) not null ,
c_comment varchar(117) not null ) lock mode row;

create raw table "informix".orders (
o_orderkey integer not null ,
o_custkey integer not null ,
o_orderstatus char(1) not null ,
o_totalprice decimal(15,2) not null ,
o_orderdate date not null ,
o_orderpriority char(15) not null ,
o_clerk char(15) not null ,
o_shippriority integer not null ,
o_comment varchar(79) not null ) lock mode row;

create raw table "informix".lineitem (
l_orderkey integer not null ,
l_partkey integer not null ,
l_suppkey integer not null ,
l_linenumber integer not null ,
l_quantity decimal(15,2) not null ,
l_extendedprice decimal(15,2) not null ,
l_discount decimal(15,2) not null ,
l_tax decimal(15,2) not null ,
l_returnflag char(1) not null ,
l_linestatus char(1) not null ,
l_shipdate date not null ,
l_commitdate date not null ,
l_receiptdate date not null ,
l_shipinstruct char(25) not null ,
l_shipmode char(10) not null ,
l_comment varchar(44) not null ) lock mode row;

create external table "informix".nation_ext sameas nation 
using ( datafiles("disk:/home/informix/dbgen_article/nation.tbl"));
create external table "informix".region_ext sameas region 
using ( datafiles("disk:/home/informix/dbgen_article/region.tbl"));
create external table "informix".part_ext sameas part 
using ( datafiles("disk:/home/informix/dbgen_article/part.tbl"));
create external table "informix".supplier_ext sameas supplier 
using ( datafiles("disk:/home/informix/dbgen_article/supplier.tbl"));
create external table "informix".partsupp_ext sameas partsupp 
using ( datafiles("disk:/home/informix/dbgen_article/partsupp.tbl"));
create external table "informix".customer_ext sameas customer 
using ( datafiles("disk:/home/informix/dbgen_article/customer.tbl"));
create external table "informix".orders_ext sameas orders 
using ( datafiles("disk:/home/informix/dbgen_article/orders.tbl"));
create external table "informix".lineitem_ext sameas lineitem 
using ( datafiles("disk:/home/informix/dbgen_article/lineitem.tbl"));

insert into nation select * from nation_ext;
insert into region select * from region_ext;
insert into supplier select * from supplier_ext;
insert into customer select * from customer_ext;
insert into part select * from part_ext;
insert into partsupp select * from partsupp_ext;
insert into lineitem select * from lineitem_ext;
insert into orders select * from orders_ext;

alter table nation type (standard);
alter table region type (standard);
alter table supplier type (standard);
alter table customer type (standard);
alter table part type (standard);
alter table partsupp type (standard);
alter table orders type (standard);
alter table lineitem type (standard);

create unique index region_pk on region (r_regionkey);
create unique index nation_pk on nation (n_nationkey);
create unique index supplier_pk on supplier (s_suppkey);
create unique index customer_pk on customer (c_custkey);
create unique index part_pk on part (p_partkey);
create unique index partsupp_pk on partsupp (ps_partkey, ps_suppkey);
create unique index lineitem_pk on lineitem (l_orderkey, l_linenumber);
create unique index orders_pk on orders (o_orderkey);

create index nation_fk_region on nation (n_regionkey);
create index supplier_fk_nation on supplier (s_nationkey);
create index partsupp_fk_part on partsupp (ps_partkey);
create index partsupp_fk_supplier on partsupp (ps_suppkey);
create index customer_fk_nation on customer (c_nationkey);
create index orders_fk_customer on orders (o_custkey);
create index lineitem_fk_orders on lineitem (l_orderkey);
create index lineitem_fk_part on lineitem (l_partkey);
create index lineitem_fk_supplier on lineitem (l_suppkey);

alter table region add constraint (primary key (r_regionkey));
alter table nation add constraint (primary key (n_nationkey));
alter table supplier add constraint (primary key (s_suppkey));
alter table customer add constraint (primary key (c_custkey));
alter table part add constraint (primary key (p_partkey));
alter table partsupp add constraint (primary key (ps_partkey, ps_suppkey));
alter table lineitem add constraint (primary key (l_orderkey, l_linenumber));
alter table orders add constraint (primary key (o_orderkey));

alter table nation add constraint (foreign key (n_regionkey) references region novalidate);
alter table supplier add constraint (foreign key (s_nationkey) references nation novalidate);
alter table partsupp add constraint (foreign key (ps_partkey) references part novalidate);
alter table partsupp add constraint (foreign key (ps_suppkey) references supplier novalidate);
alter table customer add constraint (foreign key (c_nationkey) references nation novalidate);
alter table orders add constraint (foreign key (o_custkey) references customer novalidate);
alter table lineitem add constraint (foreign key (l_orderkey) references orders novalidate);
alter table lineitem add constraint (foreign key (l_partkey) references part novalidate);
alter table lineitem add constraint (foreign key (l_suppkey) references supplier novalidate);

select current from systables where tabid = 1;

Conclusion

For testing and POC exercises, often what is required is a populated database of a specific size; this article provides enough information to implement the TPCH database with a data population of any size ranging from 1 GB to any size and demonstrates using IBM Informix external tables from “disk” or “pipe”.

Disclaimer

The code fix suggested above is provided “as is” without warranty of any kind, either express or implied, including without limitation any implied warranties of condition, uninterrupted use, merchantability, fitness for a particular purpose, or non-infringement.