Obnoxio The Clown: performance

Showing posts with label performance. Show all posts

Friday, 8 August 2008

Tech Tip du Jour: Update them stats!

People are often overly paranoid about running UPDATE STATISTICS, running it too often, or running them at too high a level, or, worst of all, running the wrong stats.

So, to keep it simple, let's start with what is a bare minimum:

Do an UPDATE STATISTICS LOW on your entire database every three months.

Are you still experiencing any unduly slow queries? Are the underlying tables very active? Perhaps try UPDATE STATISTICS LOW on those tables monthly, or even weekly. If you find that UPDATE STATISTICS fixes things for a while, but then performance tails off, you probably just need to UPDATE STATISTICS more often. However, more than once a week is probably an indication of a different problem.

So, assuming you still have query problems, it's time to start taking time to look into other levels of statistics. However, do not assume that UPDATE STATISTICS MEDIUM is necessarily any better than UPDATE STATISTICS LOW. I've seen more pain from this assumption than I care to remember.

Identify the queries which are still running badly (onstat -g sql), identify the tables that they query (you will almost certainly find one or two tables are common to most of them) and fix the statistics on these.

And now it's time for me to slip into lazy Delia mode: in here is a program called dostats.ec, written by a very clever man called Art Kagel, who has saved us all a lot of trouble -- compile it and run it against the previously identified tables. The temptation may be there to run it against the entire database, but don't succumb, because in many cases it will just be wasted disk space and wasted cycles.

Think of your carbon footprint!

Just kidding! But seriously, don't overdo your stats, OK?

Thursday, 7 August 2008

Tech Tip du Jour: CPU load

Today I'm going to take a brief step outside the Informix box and talk about sar and vmstat and one very particular issue I've seen: if sar or vmstat (or whatever) reports that you have less than an order of magnitude more %user than %sys, you have got a problem.

Let me illustrate:


informix@freebnort:/home/informix
pts/4:ksh$ sar 5 12
Linux 2.6.16.42-0.10-smp (freebnort)     07/08/08

22:25:12          CPU    %user    %nice    %sys    %iowait     %idle
22:25:17          all    10.66     0.00      12.52       0.00     76.83
22:25:22          all 9.13     0.00      12.30       0.00     78.57
22:25:27          all    12.04     0.00      12.76       0.00     75.20
22:25:32          all    11.46     0.00      12.69       0.00     75.85
22:25:37          all    10.14     0.00      12.69       0.00     77.17
22:25:42          all    12.81     0.00      13.17       0.00     74.02
22:25:47          all    12.15     0.00      12.71       0.00     75.14
22:25:52          all    13.76     0.00      13.10       0.00     73.14
22:25:57          all    21.79     0.00      12.89       0.00     65.32
22:26:02          all    14.84     0.00      12.64       0.00     72.53
22:26:07          all    12.11     0.00      12.47       0.00     75.42
22:26:12          all    13.43     0.00      13.18       0.00     73.38
Average:          all    12.86     0.00      12.76       0.00     74.38

informix@freebnort:/home/informix
pts/4:ksh$

(Pardon the formatting, blogger is doing my head in on this.)

As you can see, the %sys is the same order of magnitude as the %user. Given that this server is only running IDS, something is wrong. The %user should be at least an order of magnitude greater than than %sys. In general, if a database server is only a database server, then if %user is 10%, then %sys should be about 1%. If the server has a lot of connections, this might get a bit higher, but %sys shouldn't generally be more than about 3% if %user is about 10%.

Things to watch out for are:

Swapping, especially if %sys is significantly greater than %user and the server doesn't seem to be doing anything.
Very high rates of network activity, such as lots of connect / disconnect activity; or poorly configured connectivity, leading to excessive transmission of small packets, rather than fewer transmissions of larger packets.

Obviously, there could be many other causes, but these are the most common.

Wednesday, 6 August 2008

Tech Tip du Jour: How much data are you moving around?

This one is a really obvious one, but it's so often overlooked in this day and age of experienced programmers who are inexperienced in the ways of databases: think about the amount of data you're moving around in those SQL statements you're bandying about.

Scenario 1:

Table tablea contains a BLOB/CLOB/BYTE/whatever. Programmer uses SELECT * on the table when he doesn't need to access the large object.

Result? Poor performance and massive overhead on the server (and network) for no reason whatsoever.

Fix? Only select the columns you actually need, whether the table contains a large object or not.

Scenario 2:

Table tableb contains 16 different versions of a LVARCHAR(1000) field. This table is the core of the system and is hammered all day long by thousands of users. Only one version of the LVARCHAR is actually ever valid at any given moment.

Result? Massive IO load on a server that should really be doing bugger-all. Pointless excess load on the network.

Fix? Normalise the LVARCHAR into a separate table with a version number.

Scenario 3:

Table tablec contains a large object. The application has been coded in such a way that rows in tablec are read multiple times when the data has not changed between reads.

Result? Poor performance and massive overhead on the server (and network) for no reason whatsoever.

Fix? Only select rows when you actually need them, whether the table contains a large object or not.

Scenario 4:

Programmer needs to sum the values in a column for a given set of rows. He uses "SELECT * FROM tabled WHERE foo = bar" to pull the required rows into Java objects, then manually sums the values by inspecting the objects. (Really, I'm not making this up.)

Result? Unnecessary network load and poor performance.

Fix? Learn a bit of SQL, and try SELECT sum(columnname).

I make no apologies whatsoever for the apparent fatuity of this post. Programmers out there are making these exact mistakes right now.

Sunday, 3 August 2008

Tech Tip du Jour: The Sacred Mysteries of the Threshold of the B-Tree Cleaner

Non-geeks may want to look away now. And I know I've done this before, but ...

The B-Tree Cleaner is one of the least understood aspects of the Informix engine. I know this, because I don't understand it at all. However, what little I do know is this:

Its job is to clean indexes of entries marked as deleted and rebalance them
It can cause apparently bizarre performance issues

The reason it causes so much grief is often down to an application design issue: if you have one or more tables where index entries are frequently deleted, then the B-Tree cleaner process has an abnormal amount of work to do. And it shows up as "inexplicable slowness, given the expected workload." It often appears in incomprehensibly long checkpoints, because the IO system gets tied up with the rebalancing and doesn't have the bandwidth to do checkpoints.

Here's something I've seen often: do you have a table that you use as a queue? Where something comes in, gets fiddled with and then gets deleted? That's your problem right there, especially if the table is very active but stays roughly the same size as inserts are roughly matched by deletes. What's happening is this:

Row gets deleted
Index entry gets marked for deletion
B-Tree cleaner notices the new marker, deletes the entry and tries to rebalance the index
New row gets inserted, buggering up the newly-balanced index
GOTO 1 (don't try this at home kids, it rots your teeth and makes your feet itch.)

The essence of this is that the B-Tree cleaner is permanently rebalancing indexes for no real reason and achieving bugger-all at great resource expense. So, the way around this is to tell the B-Tree cleaner not to start cleaning at the drop of a hat, but to wait for a certain amount of activity before it does anything. You do this by increasing the threshold, let's say we increase it to 1 million. This means that rather than rebalancing the index every time a row is deleted, the cleaner waits until 1 million hit against that index have occurred and then it tries to rebalance it.

The chances are that the rebalance will still achieve bugger-all, but because it's not happening so often, the server has more CPU time and IO capacity to do useful things and performance gets much better. It's not helped by the fact that such tables tend to have a zillion indexes on them, because they're generally quite central to a system. So not only do they have lots of rebalances called for, but the rebalances affect a large number of indexes, making the problem even worse. And if the table size is stable but large ... oy, vey!

The way that you can assess the need to set the threshold is onstat -C hot:


IBM Informix Dynamic Server Version 11.50.FC1     -- On-Line  -- Up 3 days 10:08:55 -- 20575966 Kbytes

Btree Cleaner Info

Index Hot List
==============
   Current Item           2     List Created          01:24:22
   List Size              1     List expires in            299 sec
   Hit Threshold       5000     Range Scan Threshold       100


Partnum        Key            Hits
0x079002A3       1            9876 *
0x079002A2       1            8877 *
0x079002A7       1            8008 *
0x079002AB       1            7777 *
0x079002AD       1            6878 *
0x079002AA       1            6444 *
0x079002A9       1            6321 *
0x079002A1       1            6001 *
0x079002A0       1            5448 *
0x079002A4       1            5333 *
0x079002A6       1            5078 *

If this list is always long, and if the "list expires in" is always close to 300 seconds, then the chances are your threshold is too low. Be bold in increasing the threshold. Looking at the list above, I'd start by multiplying the threshold by 10. (I bet you'll see an improvement in performance right away!) If you still see the "list expires in" pretty consistently above 295 seconds, multiply it by 10 again. Keep going until you get to a point where "list expires in" drops below 290 most of the time. Don't worry if it gets to zero, that's a good thing.

(By the way, the asterisk at the end of each line above means that the particular index has been processed. If you run onstat -C hot -r 1, you will see the asterisks "run down" the line as each one is processed.)

When I've sobered up (next week, probably) I'll have a go at the range option, and then wait till I'm drunk again to talk about ALICE.

Wednesday, 23 July 2008

Tech Tip du Jour: Fragmenting Strategies

Thanks to some wittering from the peanut gallery, I've decided to get onto this a bit earlier than anticipated. Bastards.

There are a number of reasons why you might want a fragmentation strategy:

You may want to eliminate dbspaces from being queried in order to increase throughput or concurrency
You may want to invoke parallelism
You may want easier data maintenance

There may be other reasons, but that's all I can think of for now.

Fragment Elimination
The reasoning here is that you would say "we'll put all department X's data in dbspaceX and all department Y's data in dbspaceY" (or some other strategy) and then when a user from department X logs in and does work, the engine will automatically exclude dbspaceY from being queried.

The problem is that in real life, data rarely gets separated out so clearly. And while that approach might work for application A, application B might have a different fragmentation requirement, rendering the given approach utterly useless.

But it's great if you can get there.

Parallelism
It doesn't really matter which fragmentation strategy you use, parallelism is always a gain. If you can't work out a fragment elimination strategy, use a mod function for even data distribution, or use round robin fragmentation. You will find that bulk operations like index builds become massively quicker, especially if you also fragment the index that you are building. You can also get dbspaces scanned in parallel, which is great for big aggregation operations.

However, don't use round robin with attached indexes -- this effectively renders the indexes useless.

Data Maintenance
One of the more common uses for fragmentation is the ability to detach fragments into separate tables, archive them and then drop the detached tables. This is a way doing very quick data purges. This is most common in data warehouses where data is dropped on a month by month or year by year basis. Just remember that your indexes either have to be attached, or fragmented on a similar strategy (the dbspaces can be different, but otherwise the strategy for the index must be the same as that for the data) or the index will be dropped and rebuilt by the engine when you detach a fragment.

Placement
Even if you don't want to make use of any of the above, there is something to be said for placing your data and indexes explicitly so that indexes for a table are in different dbspaces to the data (although this is probably more true of JBoD setups, where you know exactly where things are in disks -- it's probably not so relevant on RAID or SAN.) This will at least make it possible to use onstat -g ppf properly, though!

I somehow feel that this subject might get re-visited.

Tuesday, 22 July 2008

Tech Tip du Jour: The RIGHT Index

I frequently post about queries not only using an index, but using the right index.

So, what is the right index? Well, it's an index that supports your query as fully as possible. For my examples, I'm going to consider the stores demo database that ships with IDS. It's tiny and it's pretty much optimised, but I'm going to use it to show examples of what I mean.

So, here's the schema:

{ TABLE items row size = 18 number of columns = 6 index size = 30 }
create table items
  (
    item_num smallint,
    order_num integer,
    stock_num smallint not null ,
    manu_code char(3) not null ,
    quantity smallint,
    total_price money(8,2),

    check (quantity >= 1 ),
    primary key (item_num,order_num)
  );
revoke all on items from "public";

alter table items add constraint (foreign key (order_num)
    references orders );
alter table items add constraint (foreign key (stock_num,
    manu_code) references stock );

Here's my test query:

SELECT order_num FROM items
WHERE stock_num = 8
AND quantity > 3
ORDER BY order_num

Here's my first query plan:

select order_num from items
where stock_num = 8
and quantity > 3
order by order_num

Estimated Cost: 1
Estimated # of Rows Returned: 1
Temporary Files Required For: Order By

1) items: INDEX PATH

Filters: items.quantity > 3

(1) Index Keys: stock_num manu_code (Serial, fragments: ALL)
Lower Index Filter: items.stock_num = 8

Filters are generally a badness. They indicate that the data row is going to have to be inspected. This isn't a problem with a 67-row table, but it can lead to a lot of pain with a 67-million-row table.

Now I: CREATE INDEX foo1 ON items (stock_num, quantity);

And the plan changes to:

select order_num from items
where stock_num = 8
and quantity > 3
order by order_num

Estimated Cost: 1
Estimated # of Rows Returned: 1
Temporary Files Required For: Order By

1) items: INDEX PATH

(1) Index Keys: stock_num quantity (Serial, fragments: ALL)
Lower Index Filter: (items.stock_num = 8 AND items.quantity > 3 )

Notice that there is no longer a filter needed to satisfy the query. But I can improve this query even further: although all the restriction is done via an index, I still have to go to the data rows to get the projection.

If I now: DROP INDEX foo1; CREATE INDEX foo1 ON items (stock_num, quantity, order_num);

The query plan is now:

select order_num from items
where stock_num = 8
and quantity > 3
order by order_num

Estimated Cost: 1
Estimated # of Rows Returned: 1
Temporary Files Required For: Order By

1) items: INDEX PATH

(1) Index Keys: stock_num quantity order_num (Key-Only) (Serial, fragments: ALL)
Lower Index Filter: (items.stock_num = 8 AND items.quantity > 3 )

Notice the new "Key-Only" expression. This means that in order to get everything that I needed, I only accessed the index pages, I did not have to access the data pages at all. This is generally optimal.

Now, that was a fairly straightforward case. A more subtle case arises when you have to accept that you can't create an index and you have to choose between indexes:

For the sake of the argument, let's say I can't create any more indexes. We revert to the original index structure and my new query is this:

SELECT order_num FROM items
WHERE stock_num = 8
AND item_num = 1
ORDER BY order_num

My query plan is:

select order_num from items
where stock_num = 8
and item_num = 1
order by order_num

Estimated Cost: 1
Estimated # of Rows Returned: 1
Temporary Files Required For: Order By

1) items: INDEX PATH

Filters: items.item_num = 1

(1) Index Keys: stock_num manu_code (Serial, fragments: ALL)
Lower Index Filter: items.stock_num = 8

Is this my best option, given that I have a unique, primary key index on item_num and order_num and a non-unique, composite index on stock_num and manu_code? Well, in this case, yes, because the composite index will be much more selective. Every order will have an item_num of 1, so using the primary key index would be useless. However, if 99% of orders only had one line and we were looking for things on line two, then the primary key might be more useful.

Good optimiser statistics would be your best hope for the engine getting this decision right, but I have on one or two occasions found that the optimiser gets the decision wrong, possibly because of the comparative size of the index or some other, subtle consideration.

Know your data, know what the query is doing and make sure the optimiser is doing the right thing.

Never assume that just because a query is using an index, it's using the correct index.

Update: a minor correction to one of my CREATE INDEX statements, thanks to Ian Goddard for spotting it.

Monday, 21 July 2008

Tech Tip du Jour: Fragmenting Tables (At All)

Non-geeks may want to look away now.

It never fails to amaze me that otherwise bright people seem to think that just using RAID or a SAN is sufficient to spread data around.

Informix was completely rebuilt to make best use of SMP technology, which means that it likes to parallelize tasks where it possibly can.

But Informix cannot be made aware of operating system-level data distribution. If you want to use things like fragment elimination, PDQ, etc., then you have to use Informix's fragmentation. (You can, of course, use this on top of RAID / SAN / whatever.) But if you just use the storage subsystem to partition or fragment the data, then as far as Informix is concerned, it's all in one big pot.

This can not only make all the desirable parallelisation options unavailable, but can also lead to disk contention, long checkpoints, teenage pregnancies and athlete's foot.

I'll post some ruminations on actual fragmentation strategies in the future.

Sunday, 20 July 2008

Tech Tip du Jour: Benchmarking

Non-geeks may want to look away now.

Yes, kids, today we're going to talk about benchmarking. And I know you're all salivating at the prospect. If there is one thing that is guaranteed to get every Informix DBA moist and emotional, it's benchmarking.

Why doesn't IBM release a benchmark? Why doesn't IBM release this benchmark, or that benchmark?

But I'm not talking about that! I'm talking about when last you benchmarked your own environment. Have you ever? In 99 out of every 100 sites I've been to, the answer is "no".

So when users come up to and say: "the system is slower than it was last week", can you refute them? Do you have a representative suite of tests that you can run that reflect the consequences of increased user load and of increased data size? Can you plot a trend to show when you're likely to break your SLA so that you can get kit upgraded in time?

No? You answered "no" to all of the above?

Well, then, perhaps you need to think about writing a set of benchmarks -- they don't need to be complex, some timed dbaccess scripts will be fine! -- that will slow down as you database grows. You need to set up a separate, static database that you can use to show the consequences of increased user load.

So: take a number of your most common queries, preferably including some that aggregate all your data (what is our average order value, what is our most valuable customer?) as well as some that pull out OLTP data (which items belong to order 4285, what product is associated with product code 1037?) Write dbaccess scripts to time these, execute the scripts and record the times.

Then, create a reasonable subset database, which is static for all time. Execute the same queries against the subset database and records the times.

These performance tests should be run every day, via cron, when the system is idle, when it's busy and when load is general average. Make sure the data is kept in a way that it's easy to graph. Review the graphs regularly, you might be surprised at what you find.

(If you've got 11.1 or 11.5, you'll find nice ways of integrating this with OAT and dbcron. That's also good. But if you're on 10.0, 9.40, 7.31, 5.20 or SE, do it the old-fashioned way.)

Do your users use ODBC or JDBC? Make sure you have a full suite of network-based tests as well. I can't tell you how many times having a ready suite of network-based tests (simple things, like SQL statements in cells in Excel connected via ODBC) saved my bacon when people came round and said "The database is slow." By showing them a quick test of the Excel suite and comparing it to historical data, it not only showed that the database was fine, but that the network was fine, too. So that meant that the problem was in the application code. Quelle horreur!

I can't be more specific than that, because each database is different, each application requirement is different. But when you have done it, you will have a tool that is worth a thousand TPC benchmarks to your organisation.

Update: I managed to raise the ire of Fernando Nunes over at the Informix Technology blog, he wrote on comp.databases.informix:

I would not disagree with the points you make, but one particular point almost makes me angry... Let me quote you:

"the system is slower than it was last week", can you refute them?

Why the hell should we refute users claims? This reminds me of my "daily wars" with people from all areas (DB, developers, OS, network, SANs etc). Everybody's concern is to make clear that it's not their fault, when everyone should provide information that would make it easier to find out what is causing the issues...I strongly believe that the users don't complaint without a reason... So what we should "benchmark" are activity metrics... disk reads, full scans etc.

On IDS 11.x it's easier to collect them, but it's relatively easy to collect them in any informix version. Some simple onstats or sysmaster queries can be very valuable...

My responses are:

Why the hell should we refute users claims?

Mostly, because they're talking out of their arseholes. You have to understand this is only my experience, but after twenty-five years of doing this for a living, I can count the number of genuine instances of the system actually being demonstrably slower on the fingers of one hand.

By contrast, I get told that "the system is slower" about once a week. So, on average, one out of every 260 times I get told the system is slower, it actually IS slower. The other 259 times, the user is talking shit.

Everybody's concern is to make clear that it's not their fault, when everyone should provide information that would make it easier to find out what is causing the issues...

Well, I think I highlighted that in my example: I exonerated the database and the network by demonstrating something that even a CIO can understand: how long it takes to do a repeatable operation. This was important for finding the root cause.

what we should "benchmark" are activity metrics... disk reads, full scans etc.

onstat information might mean something to you and me, but it don't mean shit to a CIO. Time is something everyone can understand. Both things have their place.

In an ideal world Fernando would be completely right and I'd be completely wrong. But in my experience, we're not in an ideal world, and you're probably better off listening to both of us, rather than one of us.

Thursday, 17 July 2008

Tech Tip du Jour: Variable Length Records

Nothing pisses me off more than people who use variable length records.*

People coming from an Oracle background seem to be very keen on the idea, I suppose when your database is so shit that you have half the world's hardware to keep your CRM system up, it doesn't really matter. But actually, unless you really do have very variable data, variable length fields are a bad idea:

Variable length fields all have a storage size overhead. I've seen far too many fucking idiots with VARCHAR(1) and VARCHAR(2) in their tables.
People ALWAYS say: "well, since it's a VARCHAR, we'll make it the maximum possible size, because there's no downside to that." Well, guess what? There fucking is: when the engine inserts a variable length row, it checks to see if the maximum possible length of the row would fit in the remainder of the page.
This means that if your maximum row length is more than 50% of your page size, each row will go on its own page, even if the actual row is only 5 bytes long. This is fixed in 11.1, but only if you a) configure MAX_FILL_DATA_PAGES and b) unload and reload all your data. Plus, the engine has extra work to do: without MAX_FILL_DATA_PAGES, the engine has to calculate the maximum row size; with it, the engine may have to move the row to another page if an UPDATE increases the row length sufficiently.
I can count on the fingers of one hand the number of times I've seen VARCHARS and LVARCHARS used for a justifiable reason. Inevitably, people use VARCHARS because they're too fucking lazy to do any kind of analysis on the contents of their data. This is the bit that really pisses me off, because it's an indication that there's going to be lots of shit you need to pick up in years to come. Don't be a cunt, do the spade work -- it always pays off in the end.

*This is a lie. But not much.

Tuesday, 15 July 2008

Tech Tip du Jour: A Quick Performance Check

Non-geeks may want to look away now.

This isn't a definitive guide to performance, I'm saving that for a day when ZNL doesn't fuck me off to the max. But if you want a quick and dirty guide to getting through probably 95% of all performance problems, it goes a little something like this:

1. Look at your application.
a) Do you have the right indexes? Just because a query is using an index, doesn't mean it's using the right index. I cannot stress this highly enough: just because a query is using an index, does not mean it's using the right index!!
b) Do you have too many indexes?
c) Could you achieve the same results with fewer singleton indexes and more composite indexes?
d) Profile your code.

2. Tune the engine.
a) Do you have enough buffers? Forget 95% read cache hit ratio, you want more than 99%. If you have a ton of buffers and you're still not over 99%, you've probably got too much read ahead. Or you've got some missing indexes -- in which case, why the fuck are you in section 2? Go back to section 1 and finish that first.
b) Do you have a lot of buffer waits? You probably need more buffers and / or more LRUs.
c) Do you have any unexplained, random performance slowdowns? It could be the B-Tree Cleaner. onstat -C hot will tell how busy the cleaner is, onstat -C will tell you how much time the cleaner has been active since the last engine reboot or onstat -z. If you are spending a lot of time cleaning, then try increasing the threshold for cleaning by one or two orders of magnitude.

3. Look at your data placement.
a) Check onstat -D for hot dbspaces. Look in the oncheck -pe output for tables in those dbspaces, then look up the hex(partnum) of those tables in sysmaster:systabnames and finally look in the onstat -g ppf for those hex(partnums). Move naughty tables around.
b) While you're about it: are any of those tables heavily sequentially scanned (2nd last column) or do they have a read cache hit ratio below 95 (last column)? These may need indexing.

4. Look at the OS and tin.
a) Make sure you haven't allocated an unreasonable amount of OS buffers which may lead to swapping.
b) Are you CPU or memory bound?
c) Make sure your underlying disk model is what you think it is.
d) If you're using RAID-5, don't come to me looking for sympathy.

Obnoxio The Clown