When you should we use a btree index.
Rule of thumb
If you use the index to access the table, then you will need to ensure that you are processing a small percentage of the total table. 10% selectivity is the minimum selectivity necessary for a b-tree index to be helpful
In general
B*Tree index would be placed :
You will read the index to get a row in the table. Here, you want to access a very small percentage of the rows in the table
nico@ORCL>set autotrace traceonly explain
nico@ORCL>select owner, status from t where owner=user;
Execution Plan
----------------------------------------------------------
Plan hash value: 470836197
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 154 | 7 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 7 | 154 | 7 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | T_IDX | 7 | | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
With a query plan like this, you should be accessing a very small percentage of this table.
The issue to look at here is the :
This means that Oracle :
When we read through an index to access the table, we will perform lots of scattered, random I/O. By scattered, I mean that the index will tell us to read :
It won't ask us to read :
in a consecutive manner and the single block I/O can be very slow.
This is not the most efficient method if you are going to have to access a large percentage of the rows. It will be generally take longer to access them via B*tree than by just full scanning table if we access a too high a percentage of the rows around :
For a thin table :
We have then :
From here, the math is very easy. We are going to read 20,000 rows via the index. This will mean, quite likely 20,000 TABLE ACCESS BY ROW ID operations.
We will process 20,000 table blocks to execute this query. However, there are only 1,000 blocks in the entire table !
(Even with a 800 bytes per row and 10 rows per block, we now have 10,000 blocks in table).
In this case, a full table scan will be much more efficient than using an index, as it has to touch each block once.
Any query that used this index to access the data would not be very efficient until it accesses :
How the data is organized physically on disk deeply impacts these calculations, as it materially affects how expensive (or inexpensive) index access will be.
If the table is naturally clustered in order by the primary key and you issue the query
select * from primary_key between :x and :y
an index range scan may be useful even if it accesses a large percentage of rows, simply because the database blocks that we need to read and reread will most likely cached, since the data is co-located.
Table | CPU Time | Logical I/O |
---|---|---|
Co-located | 0.59 seconds | 14,495 |
Disorganized | 0.85 seconds | 106,815 |
Co-located % | 70% | 13% |
The query against a disorganized table bears out the simple math we did earlier. Here is the perfect illustration of why rules of thumb are so hard to provide :
So when you have to respond “Why is it running differently between the test and production machine ?” Because they are not identical.
The index contains enough information to answer the entire query - we will not have to go to the table at all. The index will be used as a thinner version of the table.
You can process 100 percent (or any percentage, in fact) of the rows via the index. You might use an index just to create a “thinner” version of the table. The following query demonstrates this concept.
select count(*) from t where owner = user;
Execution Plan
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 17 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 17 | | |
|* 2 | INDEX RANGE SCAN| T_IDX | 7 | 119 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Here only the index is used to answer the query. The underlying table is never accessed, we simply scanned the index structure itself.
Run this query to get an idea of how your indexes are set up.
SELECT
INDEX_NAME "NAME",
DISTINCT_KEYS / NUM_ROWS * 100 "SELECTIVITY %",
NUM_ROWS,
DISTINCT_KEYS "DISTINCT",
LEAF_BLOCKS,
CLUSTERING_FACTOR,
BLEVEL "LEVEL",
AVG_LEAF_BLOCKS_PER_KEY "ALFBPKEY"
FROM
DBA_INDEXES
WHERE
DISTINCT_KEYS / NUM_ROWS < .1 AND
NUM_ROWS > 0
ORDER BY "SELECTIVITY %" DESC;