Table of Contents

About

The first measure of the plan_estimator, selectivity, represents a fraction of rows from a row set. The row set can be a base table, a view, or the result of a join or a GROUP BY operator.

<MATH> \text{Selectivity} = \frac{\text{Numbers of rows satisfying a predicate}}{\text{Total number of rows}} </MATH>

The selectivity is tied to a query predicate. The selectivity of a predicate indicates how many rows from a row set will pass the predicate test.

Selectivity lies in a value range from 0.0 to 1.0 :

  • 0.0 means that no rows will be selected from a row set,
  • 1.0 means that all rows will be selected.

If no statistics are available then the optimizer either uses:

  • dynamic sampling (if enabled)
  • or an internal default value, Different internal defaults are used, depending on the predicate type. For example, the internal default for an equality predicate (last_name = 'Smith') is lower than the internal default for a range predicate (last_name > 'Smith'). The estimator makes this assumption because an equality predicate is expected to return a smaller fraction of rows than a range predicate.

When statistics are available, the estimator uses them to estimate selectivity.

For example:

  • for an equality predicate (last_name = 'Smith'), selectivity is set to the reciprocal of the number n of distinct values of last_name, because the query selects rows that all contain one out of n distinct values. If a histogram is available on the last_name column, then the estimator uses it instead of the number of distinct values. The histogram captures the distribution of different values in a column, so it yields better selectivity estimates. Having histograms on columns that contain skewed data greatly helps the query optimizer generate good selectivity estimates.