Informix V12 - Fragmentation Vs. Sharding

Abstract

Fragmentation vs Sharding. So, what is it?

Fragmentation has been available in Informix since V7.00 which was released in the mid-nineties. It allows you to group data rows and/or indexes for a table according to a user-defined distribution scheme and physically place these fragments in separate dbspaces and on separate physical disks. This can bring benefits to query performance by spreading the I/O over multiple devices, by elimating the need to read fragments not relevant to a particular query or even scanning multiple fragments in parallel.

As data volumes grow, the ability to fragment large tables across multiple dbspaces can also reduce the requirement to create dbspaces with larger page sizes and the additional buffer pools required for them.

But, in today’s Big Data era, as data storage requirements grow at an ever increasing pace, what if the performance and capacity of a single server can no longer meet these demands ?

One possible answer could be Sharding.

Sharding was introduced at V12, it allows you to group data rows and index keys for a table according to a user-defined distribution scheme and physically place these fragments on separate servers, locally or remotely. This allows the resources of some or all of these servers to be used when processing queries.

As your database grows, rather than scaling up by adding more processors and/or RAM to an existing server, you can scale out by adding more servers. Also, as Sharding makes use of Informix Enterprise Replication, there is no requirement for the server hardware and operating systems to be the same. Read on to discover the pros and cons of Informix V12 Fragmentation vs Sharding.

Content

As an example of Fragmentation, the following SQL will create the lightning table, fragmented by expression using the country column and create a Primary Key Constraint in the IDXDBS1 dbspace :-

Copy to Clipboard

Any row inserted with the country column < 100 will go into the datadbs1 dbspace, those with the country column between 100 and 199 will go into datadbs2 etc., any rows where the country column does not match one of the conditions, the remainder, will go into datadbs5.

One benefit of this distribution scheme may be the elimination of certain fragments when executing a query that includes a range of the column country, eg:-

Copy to Clipboard

Another may be reading each fragment in parallel using the observed column, eg:-

Copy to Clipboard

We can extend the principal of the fragmented lightning table further by creating a Sharded table. This will use five separate database servers, shard_serv_1 to shard_serv_5, as opposed to the five dbspaces of the fragmented table, DATADBS1 to DATADBS5. The examples below assume all of the required database servers are already built, On-Line and they belong to the same Enterprise Replication Domain.

Each Shard Server must have a unique SHARD_ID within the Shard Cluster, this ensures Sharded Queries run in parallel on all servers, this is an onconfig parameter and if not already present, it can be set with the following :-

Copy to Clipboard

If it is already set, any change will require the editing of the onconfig file and a restart of the database server.

Each Shard Server must also have an identical database and table created, eg.

Copy to Clipboard

Once all the tables are created, we can define the Sharded table using cdr define shardCollection :-

Copy to Clipboard

(– type delete is the default behaviour. This allows the insert of data to the table on any of the database servers in the Shard Cluster. The rows are replicated to the appropriate Shard Server and then deleted from the original server. This is also the same behaviour for sharding defined using MongoDB commands. For the other options please see https://www.ibm.com/support/knowledgecenter/SSGU8G_12.1.0/com.ibm.erep.doc/ids_erp_640.htm

As with the earlier Fragmentation example, there must be a remainder Shard Server to catch any data that does not meet any earlier condition when sharding using an expression.

When querying, updating or deleting data from a Sharded table, operations are performed only on the local Shard of the table by default, to work with the complete Sharded table, you must set the USE_SHARDING environment variable from within your session, ie.

Copy to Clipboard

Finally, while Fragmentation is not available with the Informix Work Group Edition, interestingly, Sharding is, both in a purely Work Group Edition estate or mixed with Enterprise Edition instances.

Caveats

You cannot run a statement that contains an update to a Shard key that would require a row to move from one Shard server to another, the row must be deleted then inserted with the new values.

Conclusion

Hopefully this article has deceived the differences between Fragmentation vs Sharding.

Fragmentation is a way to partition horizontally a single table across multiple dbspaces on a single server. Sharding extends this capability to allow the partitioning of a single table across multiple database servers in a shard cluster. You query both a fragmented table and a sharded table in the same way. You do not need to know where the data is. Queries that are performed on one shard server retrieve the relevant data from other servers in a shard cluster. You can add shard servers to the shard cluster as your data grows.

For Work Group Edition users, Sharding possibly provides a more cost-effective solution when catering for database growth than an upgrade to the Enterprise Edition. It allows for unlimited Enterprise Replication Nodes that tables could be sharded across.

Disclaimer

Suggestions above are provided “as is” without warranty of any kind, either express or implied, including without limitation any implied warranties of condition, uninterrupted use, merchantability, fitness for a particular purpose, or non-infringement.