So, what is the right index? Well, it's an index that supports your query as fully as possible. For my examples, I'm going to consider the stores demo database that ships with IDS. It's tiny and it's pretty much optimised, but I'm going to use it to show examples of what I mean.
So, here's the schema:
{ TABLE items row size = 18 number of columns = 6 index size = 30 }
create table items
(
item_num smallint,
order_num integer,
stock_num smallint not null ,
manu_code char(3) not null ,
quantity smallint,
total_price money(8,2),
check (quantity >= 1 ),
primary key (item_num,order_num)
);
revoke all on items from "public";
alter table items add constraint (foreign key (order_num)
references orders );
alter table items add constraint (foreign key (stock_num,
manu_code) references stock );
Here's my test query:
SELECT order_num FROM items
WHERE stock_num = 8
AND quantity > 3
ORDER BY order_num
Here's my first query plan:
select order_num from items
where stock_num = 8
and quantity > 3
order by order_num
Estimated Cost: 1
Estimated # of Rows Returned: 1
Temporary Files Required For: Order By
1) items: INDEX PATH
Filters: items.quantity > 3
(1) Index Keys: stock_num manu_code (Serial, fragments: ALL)
Lower Index Filter: items.stock_num = 8
Filters are generally a badness. They indicate that the data row is going to have to be inspected. This isn't a problem with a 67-row table, but it can lead to a lot of pain with a 67-million-row table.
Now I: CREATE INDEX foo1 ON items (stock_num, quantity);
And the plan changes to:
select order_num from items
where stock_num = 8
and quantity > 3
order by order_num
Estimated Cost: 1
Estimated # of Rows Returned: 1
Temporary Files Required For: Order By
1) items: INDEX PATH
(1) Index Keys: stock_num quantity (Serial, fragments: ALL)
Lower Index Filter: (items.stock_num = 8 AND items.quantity > 3 )
Notice that there is no longer a filter needed to satisfy the query. But I can improve this query even further: although all the restriction is done via an index, I still have to go to the data rows to get the projection.
If I now: DROP INDEX foo1; CREATE INDEX foo1 ON items (stock_num, quantity, order_num);
The query plan is now:
select order_num from items
where stock_num = 8
and quantity > 3
order by order_num
Estimated Cost: 1
Estimated # of Rows Returned: 1
Temporary Files Required For: Order By
1) items: INDEX PATH
(1) Index Keys: stock_num quantity order_num (Key-Only) (Serial, fragments: ALL)
Lower Index Filter: (items.stock_num = 8 AND items.quantity > 3 )
Notice the new "Key-Only" expression. This means that in order to get everything that I needed, I only accessed the index pages, I did not have to access the data pages at all. This is generally optimal.
Now, that was a fairly straightforward case. A more subtle case arises when you have to accept that you can't create an index and you have to choose between indexes:
For the sake of the argument, let's say I can't create any more indexes. We revert to the original index structure and my new query is this:
SELECT order_num FROM items
WHERE stock_num = 8
AND item_num = 1
ORDER BY order_num
My query plan is:
select order_num from items
where stock_num = 8
and item_num = 1
order by order_num
Estimated Cost: 1
Estimated # of Rows Returned: 1
Temporary Files Required For: Order By
1) items: INDEX PATH
Filters: items.item_num = 1
(1) Index Keys: stock_num manu_code (Serial, fragments: ALL)
Lower Index Filter: items.stock_num = 8
Is this my best option, given that I have a unique, primary key index on item_num and order_num and a non-unique, composite index on stock_num and manu_code? Well, in this case, yes, because the composite index will be much more selective. Every order will have an item_num of 1, so using the primary key index would be useless. However, if 99% of orders only had one line and we were looking for things on line two, then the primary key might be more useful.
Good optimiser statistics would be your best hope for the engine getting this decision right, but I have on one or two occasions found that the optimiser gets the decision wrong, possibly because of the comparative size of the index or some other, subtle consideration.
Know your data, know what the query is doing and make sure the optimiser is doing the right thing.
Never assume that just because a query is using an index, it's using the correct index.
Update: a minor correction to one of my CREATE INDEX statements, thanks to Ian Goddard for spotting it.
No comments:
Post a Comment