Table of Contents

Oracle Database - MERGE JOIN operation

About

MERGE JOIN joins tables by merging sorted lists of records from each table on the join column. It is effective for large batch operations, but may be ineffective for joins used by transaction-processing applications.

MERGE JOIN is used whenever Oracle cannot use an index while conducting a join. In the following example, all of the tables are fully indexed. So the example deliberately disables the indexes by adding 0 to the numeric keys during the join to force a merge join to occur.

MERGE JOIN is a set operation.

Example

select COMPANY.Name 
  from COMPANY, SALES
 where COMPANY.Company_ID+0 = SALES.Company_ID+0
   and SALES.Period_ID =3
   and SALES.Sales_Total>1000;

Execution Plan

MERGE JOIN
SORT JOIN
TABLE ACCESS FULL SALES
SORT JOIN
TABLE ACCESS FULL COMPANY

Interpreting the Execution Plan

There are two potential indexes that could be used by a query joining the COMPANY table to the SALES table. First, there is an index on COMPANY.COMPANY_ID - but that index cannot be used because of the +0 value added to it.

As shown in the plan, Oracle will perform a full table scan (TABLE ACCESS FULL) on each table, sort the results (using the SORT JOIN operation), and merge the result sets. The use of merge joins indicates that indexes are either unavailable or disabled by the query’s syntax.

Steps Involved in MERGE JOINS

A MERGE JOIN operation joins the results of two data scans. The operation usually consists of the following steps:

The use of merge joins indicates that indexes are either unavailable or disabled by the query’s syntax.

As a set operation that performs full table scans of its tables, MERGE JOIN is usually not appropriate for use in multi-user online applications for the following reasons:

There are, however, situations in which MERGE JOIN is the most efficient way to perform a join. In batch operations or large reports, MERGE JOIN may be result in the best possible throughput for your query.

Tuning Implications for MERGE JOINS

MERGE JOIN will be effective wherever a full table scan will be effective. That is, in situations in which a full table scan is preferable to an index range scan/table access by ROWID combination, a MERGE JOIN operation will be preferable to a NESTED LOOPS join operation.

A full table scan is preferable under the following conditions:

During the SORT JOIN operation, Oracle sorts as much of the data as possible inside a memory area called the sort area. The maximum size of the sort area is defined by the SORT AREA SIZE parameter in the instance’s init.ora file; it is allocated in 8K increments. If the sort area is not large enough to hold all the sorted data, Oracle will allocate a temporary segment for the duration of the query. The cost of the allocation and release of the temporary segment is roughly equivalent to the cost of about 10 INSERT/UPDATE/DELETE statements. The allocation and release of temporary segments require locks to be allocated. When multiple users concurrently attempt to sort large result sets, they may experience contention during the temporary segment lock allocation process.

Data from the temporary segment is loaded into the sort area to be sorted. Every set of “load and sort” steps is called a run. If the result set is large relative to the sort area size, the data could be loaded and unloaded multiple times to and from the disk, which would generate excessive I/O.

In Oracle, writes to temporary segments bypass the SGA via a technique called Direct Sort Writes. Direct Sort Writes improve the performance of writes to temporary segments by at least 40 percent over earlier versions of the kernel.

The third step of MERGE JOIN, the comparison and merge portion, is very efficient. The performance costs involved in MERGE JOINs are almost entirely found within the first two steps: the full table scans and the sorting operations. Tuning MERGE JOIN operations should therefore focus on improving the performance of the first two steps.

The performance of full table scans can be improved through :

You can improve sort performance by setting a high value for SORT AREA SIZE or by dedicating a tablespace to temporary segment activity.

Documentation / Reference