About
The first measure of the plan_estimator, selectivity, represents a fraction of rows from a row set. The row set can be a base table, a view, or the result of a join or a GROUP BY operator.
<MATH> \text{Selectivity} = \frac{\text{Numbers of rows satisfying a predicate}}{\text{Total number of rows}} </MATH>
The selectivity is tied to a query predicate. The selectivity of a predicate indicates how many rows from a row set will pass the predicate test.
Selectivity lies in a value range from 0.0 to 1.0 :
- 0.0 means that no rows will be selected from a row set,
- 1.0 means that all rows will be selected.
If no statistics are available then the optimizer either uses:
- dynamic sampling (if enabled)
- or an internal default value, Different internal defaults are used, depending on the predicate type. For example, the internal default for an equality predicate (last_name = 'Smith') is lower than the internal default for a range predicate (last_name > 'Smith'). The estimator makes this assumption because an equality predicate is expected to return a smaller fraction of rows than a range predicate.
When statistics are available, the estimator uses them to estimate selectivity.
For example:
- for an equality predicate (last_name = 'Smith'), selectivity is set to the reciprocal of the number n of distinct values of last_name, because the query selects rows that all contain one out of n distinct values. If a histogram is available on the last_name column, then the estimator uses it instead of the number of distinct values. The histogram captures the distribution of different values in a column, so it yields better selectivity estimates. Having histograms on columns that contain skewed data greatly helps the query optimizer generate good selectivity estimates.