Non-geeks may want to look away now.
This isn't a definitive guide to performance, I'm saving that for a day when ZNL doesn't fuck me off to the max. But if you want a quick and dirty guide to getting through probably 95% of all performance problems, it goes a little something like this:
1. Look at your application.
a) Do you have the right indexes? Just because a query is using an index, doesn't mean it's using the right index. I cannot stress this highly enough: just because a query is using an index, does not mean it's using the right index!!
b) Do you have too many indexes?
c) Could you achieve the same results with fewer singleton indexes and more composite indexes?
d) Profile your code.
2. Tune the engine.
a) Do you have enough buffers? Forget 95% read cache hit ratio, you want more than 99%. If you have a ton of buffers and you're still not over 99%, you've probably got too much read ahead. Or you've got some missing indexes -- in which case, why the fuck are you in section 2? Go back to section 1 and finish that first.
b) Do you have a lot of buffer waits? You probably need more buffers and / or more LRUs.
c) Do you have any unexplained, random performance slowdowns? It could be the B-Tree Cleaner. onstat -C hot will tell how busy the cleaner is, onstat -C will tell you how much time the cleaner has been active since the last engine reboot or onstat -z. If you are spending a lot of time cleaning, then try increasing the threshold for cleaning by one or two orders of magnitude.
3. Look at your data placement.
a) Check onstat -D for hot dbspaces. Look in the oncheck -pe output for tables in those dbspaces, then look up the hex(partnum) of those tables in sysmaster:systabnames and finally look in the onstat -g ppf for those hex(partnums). Move naughty tables around.
b) While you're about it: are any of those tables heavily sequentially scanned (2nd last column) or do they have a read cache hit ratio below 95 (last column)? These may need indexing.
4. Look at the OS and tin.
a) Make sure you haven't allocated an unreasonable amount of OS buffers which may lead to swapping.
b) Are you CPU or memory bound?
c) Make sure your underlying disk model is what you think it is.
d) If you're using RAID-5, don't come to me looking for sympathy.
4 comments:
Damn here goes my chance to publish a WHOLE book on the subject!
BTSCANNER ... not so 1994
Obnoxio The Clown wrote:
> Warning: I have been drinking.
>
> http://obotheclown.blogspot.com/2008/07/tech-tip-du-jour-quick-performance.html
>
2. Tune the engine.
...
c) Do you have any unexplained, random performance slowdowns? It could be the B-Tree Cleaner. onstat -C hot will tell how busy the cleaner is, onstat -C will tell you how much time the cleaner has been active since the last engine reboot or onstat -z. If you are spending a lot of time cleaning, then try increasing the threshold for cleaning by one or two orders of magnitude.
Well, I learnt something ...
one order of magnitude is 10 times bigger, i.e. 10 to the power 1
two orders of magnitude is 100 times bigger, i.e. 10 to the power 2
Sweet.
Soooooooo, what DOES onstat -C hot tell us :
Index Hot List
==============
Current Item 7 List Created 15:17:14
List Size 6 List expires in 291 sec
Hit Threshold 2000 Range Scan Threshold 200
Partnum Key Hits
0x00B00004 1 8406 *
0x00A00006 1 8390 *
0x00A00005 1 8274 *
0x00B00003 1 5907 *
0x00A00004 1 4794 *
0x00A00003 1 2724 *
So, not really that much - certainly I can't see from this that I have "badly configured BTSCANNER settings".
I think that onstat -C clean will give a better idea of the effectiveness of the settings :
> onstat -C clean | more
IBM Informix Dynamic Server Version 10.00.FC8 -- On-Line -- Up 20 days 23:18:03 -- 263776 Kbytes
Btree Cleaner Info
Index Cleaned Statistics
=========================
Partnum Key Dirty Hits Clean Time Pg Examined Items Del Pages/Sec
...
0x00600004 1 793 501 7159768 783 14290.95
0x00600005 1 563 14 279691 734 19977.93
0x00600006 1 427 14 308061 517 22004.36
0x00600007 1 8 5 54119 597 10823.80
0x00600008 1 1840 0 0 0 0.00
0x00600009 1 8 5 49415 501 9883.00
0x0060000a 1 723 1 49703 729 49703.00
0x0060000b 1 1582 0 0 0 0.00
0x0060000c 1 1344 254 5370704 8201 21144.50
0x0060000d 1 C 188 23195 484510382 9206 20888.57
0x0060000e 1 1313 385 5004046 403 12997.52
...
So, from the above, the settings are crap :D
For example for 0x0060000d 484510382 pages were examined (hmmm, about 924 Gig on a 2k system), and 9,206 items were deleted.
Bloody hell, crap :-/ I read nearly a terabyte of data, and deleted less than 10,000 index entries. Whoa there Lesley, don't buy another disk array, increase the threshold!
So, with the default settings this is a bit concerning.
From 10.00.xC8 onwards it seems that the default threshold has increased from 500 to 5000 - a magnitude of 1 :D
BTSCANNER [num=scanner_threads,][threshold=committed_deleted],[rangesize=size]
BTSCANNER num=1,threshold=5000,rangesize=-1
So, what is a "good" setting for BTSCANNER - it depends :-/
I *think* you have to look at the bigger tables in the instance, and see how many deletes occur over a period of an hour say, and then set the threshold to that value, so that you don't get huge sequential scans of the indexes all the time.
RANGESIZE I have a problem with - it works on the theory that you will not be deleting rows from disparate areas of the indexes (i.e. from the beginning, middle and end). In a real world perspective I can't see how this can be a common case (sure deletion of old data using a serial primary key perhaps, but what about all the other possible indexes?).
Anyways, hopefully something to ponder :D, and yes, orders of magnitude would seem to be appropriate for the THRESHOLD!
I still reckon my rule of thumb is good: of onstat -C shows more than 2-3 seconds of index cleaning per hour, you've probably got a problem.
great information.
Thank you, thank you, thank you.
What a pity! that you don't post for informix.
Kind regards
Post a Comment