Wednesday, 23 July 2008

Tech Tip du Jour: Fragmenting Strategies

Thanks to some wittering from the peanut gallery, I've decided to get onto this a bit earlier than anticipated. Bastards.

There are a number of reasons why you might want a fragmentation strategy:
  1. You may want to eliminate dbspaces from being queried in order to increase throughput or concurrency
  2. You may want to invoke parallelism
  3. You may want easier data maintenance

There may be other reasons, but that's all I can think of for now.

Fragment Elimination
The reasoning here is that you would say "we'll put all department X's data in dbspaceX and all department Y's data in dbspaceY" (or some other strategy) and then when a user from department X logs in and does work, the engine will automatically exclude dbspaceY from being queried.

The problem is that in real life, data rarely gets separated out so clearly. And while that approach might work for application A, application B might have a different fragmentation requirement, rendering the given approach utterly useless.

But it's great if you can get there.

Parallelism
It doesn't really matter which fragmentation strategy you use, parallelism is always a gain. If you can't work out a fragment elimination strategy, use a mod function for even data distribution, or use round robin fragmentation. You will find that bulk operations like index builds become massively quicker, especially if you also fragment the index that you are building. You can also get dbspaces scanned in parallel, which is great for big aggregation operations.

However, don't use round robin with attached indexes -- this effectively renders the indexes useless.

Data Maintenance
One of the more common uses for fragmentation is the ability to detach fragments into separate tables, archive them and then drop the detached tables. This is a way doing very quick data purges. This is most common in data warehouses where data is dropped on a month by month or year by year basis. Just remember that your indexes either have to be attached, or fragmented on a similar strategy (the dbspaces can be different, but otherwise the strategy for the index must be the same as that for the data) or the index will be dropped and rebuilt by the engine when you detach a fragment.

Placement
Even if you don't want to make use of any of the above, there is something to be said for placing your data and indexes explicitly so that indexes for a table are in different dbspaces to the data (although this is probably more true of JBoD setups, where you know exactly where things are in disks -- it's probably not so relevant on RAID or SAN.) This will at least make it possible to use onstat -g ppf properly, though!

I somehow feel that this subject might get re-visited.

No comments: