Most modern Linux distributions install a journaling file system, i.e., a file system that keeps track of changes, by default. While providing excellent recovery for standard file types on the disk, such journaling can be significantly detrimental for a database server. Raw devices are becoming obsolete on many distributions, as DIRECT_IO can give comparable performance. Choosing the right file filesystem and appropriate features is an important decision for a database server administrator, getting this right from the start can prevent headaches further down the line.
Modern Linux distributions offer a huge range of choice of filesystem types; some of these are journaling by default, other have this as an option. The following table shows some of the more common offerings.
|Filesystem||Journaled||Can turn journaling off|
* Not recommended
While we would advise against a journaling filesystem for your database storage, there is no reason not to mix and match; e.g. have your root filesystem on ext3, and your dbspaces on ext4.
How do I know which filesystems I have and with which options?
Filesystems mounted at boot time are listed in /etc/mtab, and the mount command (with no parameters) will list currently mounted filesystems, e.g.
To find out if your filesystem has journaling enabled, run the dump2efs or debugfs command as root, and check for has_journal in the output, e.g.:
What’s so bad about a journaling filesystem?
Generally, journaling is a good feature to have on a filesystem, it can significantly speed recovery after a power failure; however, this only really works on filesystems with large number of small files, and with relatively few changes. For database files which can potentially have thousands of updates a second, and usually implement their own methods of recovery anyway, this is an unnecessary overhead. Some older journaling filesystems can also cause inconsistencies by flushing their metadata to disk before the related changes, potentially causing data corruption. Other modern features such as filesystem read-ahead and access time updates should also be disabled for maximum performance.
You can turn journaling off on an ext4 filesystem with the following command:
We would strongly recommend running a filesystem check after this operation though, e.g:
Modern filesystems come with some great new features that can increase the reliability, performance and recoverability of your operating system, but these features can clash with similar features already built into your database server. Transaction logging, DIRECT_IO and buffer cache have been features in IBM data servers for quite some time, making equivalent features in the filesystem an unnecessary and potentially disastrous overhead. Choosing a filesystem without these features (such as ext2), or one where these features can be disabled (such as ext4) for your database storage should avoid these issues.
The code fix suggested above is provided “as is” without warranty of any kind, either express or implied, including without limitation any implied warranties of condition, uninterrupted use, merchantability, fitness for a particular purpose, or non-infringement.