The Impact of DIRECT_IO and File System Caching

Abstract

This article follows on from the earlier TPC-C benchmarking performed on IDS 12.10, if you missed it, you can read it here. This article takes a view on the topic of DIRECT_IO and its use within IDS, it also highlights the impact of file system caching on Linux, which is relevant for those editions of Informix where DIRECT_IO cannot be used. Lastly, a ‘just for fun’ comparison of DIRECT_IO and RAW devices for readers to come to their own conclusions.

Content

Traditionally, an Informix database system would store its data on a RAW device. Using this method, Informix would communicate directly with the storage device and bypass the operating system’s file management and filesystem caching layer. This is generally considered to provide the best performance without compromising integrity.

DIRECT_IO is designed to mimic the behaviour and performance of RAW device access when Informix is accessing data stored in “cooked” files (i.e. data stored on a filesystem managed by the OS). DIRECT_IO is not new in the world of Linux and Informix (O_DIRECT was introduced to the Linux 2.4.10 kernel in 2001). However, it is still not commonly used in our experience when configuring Informix storage.

We were interested to see the effect that file system caching has in masking underlying I/O performance bottlenecks, and how enabling or disabling elements of filesystem caching might affect Informix performance working on both RAW and DIRECT_IO configurations.

It is important to note that IBM Informix Innovator-C Edition is not licensed to use DIRECT_IO.

Note: The following scenarios which adjust File System Caching are not recommended on any system as performance will be adversely affected.

All of the tests described in this document were performed on the following environment:

System Specification:

• Dell M6600
• Intel Core i7-2860QM (2.50GHz, 4 Cores, 8 Logical Processors)
• 32GB RAM
• 7200RPM SCSI HDD *

* Configured with an ext2 filesystem for the DIRECT_IO tests to avoid journaling and to honour the mount option ‘sync’.

Tests Performed:

1. DIRECT_IO Enabled
2. DIRECT_IO Disabled with File System Caching untouched – (We expect this to be faster)

(We expect the two following changes to deteriorate the performance of DIRECT_IO disabled)
3. DIRECT_IO Disabled with Read Caching Disabled
4. DIRECT_IO Disabled with Write Caching Disabled

(Expected to be similar to DIRECT_IO ON)
5. DIRECT_IO Disabled with both Read and Write Caching Disabled
6. DIRECT_IO Disabled with both Read and Write Caching Disabled and performing a continuous cache flush

(Expected to be comparable to DIRECT_IO Enabled with File System Caching untouched)
7. DIRECT_IO Enabled with both Read and Write Caching Disabled

(Expected to be comparable to DIRECT_IO Enabled with File System Caching untouched)
8. RAW

Enabling and Checking DIRECT_IO Settings:

Our first step was to enable DIRECT_IO. To enable DIRECT_IO, simply set the $ONCONFIG parameter (DIRECT_IO) to 1. This will require an engine restart to take effect, and is also only applicable when using COOKED files.

We can validate DIRECT_IO is being used at the Informix level in two ways: either (a) examine the Informix configuration or (b) examine the underlying Informix storage layout. Both techniques are explored below.

To examine the Informix configuration, use ‘onstat -g cfg DIRECT_IO’ (a feature introduced in version 12.10); or query the sysmaster:sysconfig table.

informix@demo-prodhost:~> onstat -g cfg DIRECT_IO

IBM Informix Dynamic Server Version 12.10.FC4WE -- On-Line -- Up 01:06:33 -- 8666392 Kbytes

name                      current value
DIRECT_IO                 1

To examine the Informix storage layout use ‘onstat –d’:


informix@demo-prodhost:~> onstat -d

IBM Informix Dynamic Server Version 12.10.FC4WE -- On-Line -- Up 01:06:37 -- 8666392 Kbytes

…

Chunks
address          chunk/dbs     offset     size       free       bpages     flags pathname
44ec9258         1      1      0          131072     122147                PO-B-D /home/informix/tpcc_data/chunks/rootdbs
475ee028         2      2      0          524288     1350                  PO-B-D /home/informix/tpcc_data/chunks/physdbs
4770a028         3      3      0          524288     32715                 PO-B-D /home/informix/tpcc_data/chunks/logdbs
47891028         4      4      0          524288     524235                PO-B-- /home/informix/tpcc_data/chunks/tempdbs_01
47551028         5      5      0          524288     524235                PO-B-- /home/informix/tpcc_data/chunks/tempdbs_02
48113028         6      6      0          524288     524235                PO-B-- /home/informix/tpcc_data/chunks/tempdbs_03
47a74028         7      7      0          524288     524235                PO-B-- /home/informix/tpcc_data/chunks/tempdbs_04
47a77028         8      8      0          32768      30487      30487      POSB-D /home/informix/tpcc_data/chunks/sbspace_01
                                 Metadata 2228       1657       2228
479a8028         9      9      0          3145728    117475                PO-B-D /home/informix/tpcc_data/chunks/datadbs_01
47a7a028         10     10     0          3145728    2251995               PO-B-D /home/informix/tpcc_data/chunks/datadbs_02
 10 active, 32766 maximum

The 'D' flag on 6th position indicates that the chunk has been opened using the Linux O_DIRECT flag (DIRECT_IO).

From this point everything indicates IDS is using DIRECT_IO; however, we can further validate this at the OS level in two ways. As of Linux 2.6.22 and up, we can check the flag information found within the processes open file descriptors under /proc/$PID/fdinfo/$FD:

demo-prodhost:/ # ps -ef | grep informix | grep oninit | head -1
informix 14182     1  2 17:22 ?        00:01:48 oninit -ivwy

demo-prodhost:/ # ls -l /proc/14182/fd/256
lrwx------ 1 root root 64 May 10 18:30 /proc/14182/fd/256 -> /informix_data/IBM/informix/demo_prodhost/chunks/rootdbs

demo-prodhost:/ # cat /proc/14182/fdinfo/256
pos:    0
flags:  0150002
mnt_id: 43
 

This method obviously requires to decode the octal flag for each open file descriptor, so a simpler method exists. Using the Linux utility 'lsof' provides a human readable decoded version of the octal flag found in the method above.

The "DIR" FILE-FLAG represents O_DIRECT (DIRECT_IO) is being used in the lsof output.

Notice how the Temporary DBSpaces do not use this... (also shown in the ‘onstat -d’ output)


demo-prodhost:/ # lsof +fg /home/informix/tpcc_data/chunks/* | grep DIR
oninit  14182 informix  256u   REG RW,SYN,DIR,0x8000   8,17  268435456 7864330 /informix_data/IBM/informix/demo_prodhost/chunks/rootdbs
oninit  14182 informix  257u   REG RW,SYN,DIR,0x8000   8,17   67108864 7864339 /informix_data/IBM/informix/demo_prodhost/chunks/sbspace_01
oninit  14182 informix  258u   REG RW,SYN,DIR,0x8000   8,17 6442450944 7864334 /informix_data/IBM/informix/demo_prodhost/chunks/datadbs_02
oninit  14182 informix  259u   REG RW,SYN,DIR,0x8000   8,17 1073741824 7864331 /informix_data/IBM/informix/demo_prodhost/chunks/physdbs
oninit  14182 informix  260u   REG RW,SYN,DIR,0x8000   8,17 1073741824 7864332 /informix_data/IBM/informix/demo_prodhost/chunks/logdbs
oninit  14182 informix  261u   REG RW,SYN,DIR,0x8000   8,17 6442450944 7864333 /informix_data/IBM/informix/demo_prodhost/chunks/datadbs_01
oninit  14237     root  256u   REG RW,SYN,DIR,0x8000   8,17  268435456 7864330 /informix_data/IBM/informix/demo_prodhost/chunks/rootdbs
oninit  14237     root  257u   REG RW,SYN,DIR,0x8000   8,17 1073741824 7864331 /informix_data/IBM/informix/demo_prodhost/chunks/physdbs
oninit  14237     root  258u   REG RW,SYN,DIR,0x8000   8,17 1073741824 7864332 /informix_data/IBM/informix/demo_prodhost/chunks/logdbs
oninit  14237     root  263u   REG RW,SYN,DIR,0x8000   8,17   67108864 7864339 /informix_data/IBM/informix/demo_prodhost/chunks/sbspace_01
oninit  14237     root  264u   REG RW,SYN,DIR,0x8000   8,17 6442450944 7864333 /informix_data/IBM/informix/demo_prodhost/chunks/datadbs_01
oninit  14237     root  265u   REG RW,SYN,DIR,0x8000   8,17 6442450944 7864334 /informix_data/IBM/informix/demo_prodhost/chunks/datadbs_02
oninit  14264 informix  256u   REG RW,SYN,DIR,0x8000   8,17  268435456 7864330 /informix_data/IBM/informix/demo_prodhost/chunks/rootdbs
oninit  14264 informix  257u   REG RW,SYN,DIR,0x8000   8,17 1073741824 7864331 /informix_data/IBM/informix/demo_prodhost/chunks/physdbs
oninit  14264 informix  258u   REG RW,SYN,DIR,0x8000   8,17   67108864 7864339 /informix_data/IBM/informix/demo_prodhost/chunks/sbspace_01
oninit  14264 informix  259u   REG RW,SYN,DIR,0x8000   8,17 6442450944 7864333 /informix_data/IBM/informix/demo_prodhost/chunks/datadbs_01
oninit  14264 informix  260u   REG RW,SYN,DIR,0x8000   8,17 1073741824 7864332 /informix_data/IBM/informix/demo_prodhost/chunks/logdbs
oninit  14264 informix  261u   REG RW,SYN,DIR,0x8000   8,17 6442450944 7864334 /informix_data/IBM/informix/demo_prodhost/chunks/datadbs_02
oninit  14265 informix  256u   REG RW,SYN,DIR,0x8000   8,17  268435456 7864330 /informix_data/IBM/informix/demo_prodhost/chunks/rootdbs
oninit  14265 informix  257u   REG RW,SYN,DIR,0x8000   8,17 1073741824 7864332 /informix_data/IBM/informix/demo_prodhost/chunks/logdbs
oninit  14265 informix  258u   REG RW,SYN,DIR,0x8000   8,17 1073741824 7864331 /informix_data/IBM/informix/demo_prodhost/chunks/physdbs
oninit  14265 informix  259u   REG RW,SYN,DIR,0x8000   8,17 6442450944 7864333 /informix_data/IBM/informix/demo_prodhost/chunks/datadbs_01
oninit  14265 informix  260u   REG RW,SYN,DIR,0x8000   8,17 6442450944 7864334 /informix_data/IBM/informix/demo_prodhost/chunks/datadbs_02
oninit  14266 informix  256u   REG RW,SYN,DIR,0x8000   8,17  268435456 7864330 /informix_data/IBM/informix/demo_prodhost/chunks/rootdbs
oninit  14266 informix  257u   REG RW,SYN,DIR,0x8000   8,17 1073741824 7864331 /informix_data/IBM/informix/demo_prodhost/chunks/physdbs
oninit  14266 informix  258u   REG RW,SYN,DIR,0x8000   8,17   67108864 7864339 /informix_data/IBM/informix/demo_prodhost/chunks/sbspace_01
oninit  14266 informix  259u   REG RW,SYN,DIR,0x8000   8,17 6442450944 7864333 /informix_data/IBM/informix/demo_prodhost/chunks/datadbs_01
oninit  14266 informix  260u   REG RW,SYN,DIR,0x8000   8,17 6442450944 7864334 /informix_data/IBM/informix/demo_prodhost/chunks/datadbs_02
oninit  14266 informix  261u   REG RW,SYN,DIR,0x8000   8,17 1073741824 7864332 /informix_data/IBM/informix/demo_prodhost/chunks/logdbs

Using these methods above, provides the ability to accurately validate that Informix is using DIRECT_IO.

The following output shows an I/O test with and without O_DIRECT, which demonstrates clearly the impact of file system caching at the system level, whereby the cached reads are orders of magnitude greater when exploiting the file system cache, which is fully expected.

demo-prodhost:~ # hdparm -tT /dev/sdb1

/dev/sdb1:
 Timing cached reads:   23420 MB in  2.00 seconds = 11727.77 MB/sec
 Timing buffered disk reads:  78 MB in  3.51 seconds =  22.22 MB/sec

demo-prodhost:~ # hdparm --direct -tT /dev/sdb1

/dev/sdb1:
 Timing O_DIRECT cached reads:   164 MB in  2.01 seconds =  81.79 MB/sec
 Timing O_DIRECT disk reads:  78 MB in  3.19 seconds =  24.43 MB/sec

Disabling what appears to be all elements of the caching system at the OS is difficult and is not something you’d ever want to do in a real-life situation.

To attempt to disable the read-cache, the mount option “sync” was used:

demo-prodhost:~ # umount /dev/sdb1
demo-prodhost:~ # mount -o sync /dev/sdb1
demo-prodhost:~ # cat /proc/mounts | grep sdb1
/dev/sdb1 /informix_data ext2 rw,sync,relatime 0 0

To attempt to disable the write-cache the following was used:

demo-prodhost:~ # hdparm -W0 /dev/sdb1

/dev/sdb1:
 setting drive write-caching to 0 (off)
 write-caching =  0 (off)

The above two were used in combination to attempt to “fully” disable the file system caching, with a final test which also utilises a continuous flush of the cache using the following method:

while true; do sync; echo "3" > /proc/sys/vm/drop_caches; done

Before looking at the results, a final comparative test was performed with DIRECT_IO vs. RAW on our Amazon EC2 cloud environment where the disk I/O performance far exceeded the base commodity hardware and should demonstrate a fairer test between the two.

Results:

The following graph shows the I/O performance for each test taken from ‘onstat -g ckp’:

Substitution: Image 1

From our perspective, there are three interesting observations here:

1) File System Caching can massively mask the I/O performance, and with database technology we want to make sure the data is written to disk to preserve data integrity and avoid corruption.
2) Based on the above tests and results, it strongly appears that File System Caching is used at some level when using DIRECT_IO, as turning elements of File System Caching off and using DIRECT_IO severely effects Diskflsh Time.
3) DIRECT_IO and RAW have very similar results.

Conclusion

The above article is intended to outline the different performance characteristics between RAW and non-RAW device access and to describe the performance relationship between DIRECT_IO and filesystem caching.

Disclaimer

The above is provided "as is", without warranty of any kind, either express or implied, including without limitation any implied warranties of condition, uninterrupted use, merchantability, fitness for a particular purpose, or non-infringement.