About
There are specific optimizations between SQL parallel execution and Oracle Partitioning that you should bear in mind when you plan to use these functionalities together.
For example, two large partitioned tables that can take advantage of parallel partial or full partition-wise joins can be joined faster than if no partitioning is involved.
Ideally (sub)partitions are similar in size which can be achieved by using hash (sub)partitioning on a unique or almost unique column with the number of hash (sub)partitions a power of 2.
Large databases and particularly data warehouses – the types of databases that mostly use parallel execution – should always use Oracle Partitioning.
Partitioning can provide great performance improvements because of partition elimination (pruning) capabilities, but also because parallel execution plans can take advantage of partitioning.
Rules:
- If at least one of the tables accessed in the join has been partitioned on the join key the database may decide to use a partition-wise join.
- If both tables are equi-partitioned on the join key the database may use a full partition-wise join.
- Otherwise a partial partition-wise join may be used in which one of the tables is dynamically partitioned in memory followed by a full partition-wise join.
Articles Related
Example: Parallel plan with partitioning and partition-wise join
Consider two large tables SALES and CUSTOMERS. Partition:
- the SALES table using composite partitioning RANGE on ORDER_DATE, HASH on CUSTOMER_ID.
- the CUSTOMERS table using HASH partitioning on CUSTOMER_ID.
Parallel table joins between SALES and CUSTOMERS can now take advantage of full partition-wise joins.
Let's create the tables SALES and CUSTOMERS as follows:
- HASH partitioning on the ID column for the CUSTOMERS table using 128 partitions.
- Table SALES and CUSTOMERS are now equi-partitioned on the join column
-----------------------------------------------------------------------------
| Id | Operation | Name | Pstart| Pstop| TQ | PQ Distrib |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | |
| 1 | PX COORDINATOR | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10001 | | | Q1,01 | QC (RAND) |
| 3 | SORT GROUP BY | | | | Q1,01 | |
| 4 | PX RECEIVE | | | | Q1,01 | |
| 5 | PX SEND HASH | :TQ10000 | | | Q1,00 | HASH |
| 6 | SORT GROUP BY | | | | Q1,00 | |
| 7 | PX PARTITION HASH ALL | | 1 | 128 | Q1,00 | |
| 8 | HASH JOIN | | | | Q1,00 | |
| 9 | TABLE ACCESS FULL | CUSTOMERS | 1 | 128 | Q1,00 | |
| 10 | TABLE ACCESS FULL | SALES | 1 | 128 | Q1,00 | |
-----------------------------------------------------------------------------
This execution plan use the hash partitioned tables. Unlike without partitioning, you do not see the granules for table SALES and CUSTOMERS right away in the plan.
The simple reason for this because we are now using partition-based granules, so Oracle does not have to partition the data for parallel access at runtime; the database simply has to iterate over existing partitions.
Furthermore, we are joining two equi-partitioned tables leveraging a partition-wise join.
The partition-based granules are not only identical for both tables, but the iteration (processing) of granules is now a processing of pairs of partitions that includes the join as well.
One parallel server process is working on one equivalent partition pair at a given point in time. Consequently, the partition-based granule iterator is ABOVE the hash join operation in the execution plan.
Besides the known processing steps of parallel execution this new behaviour of a partition-wise join is seen in the execution plan.
- Tables SALES and CUSTOMERS are accessed in parallel, iterating over the existing equi-partitioned hash partition-based granules (ID 7). You can read this operation as “loop over all hash partitions and process the operations below”. A set of parallel servers is working on n partitions at a time (n equals the DOP), from partition 1 to 128 (identified through columns 'Pstart' and 'Pstop').
- For each HASH partition pair, a parallel server process joins the table CUSTOMERS and SALES.
No data redistribution is taking place to join tables SALES and CUSTOMERS.
In the case of inter-node parallel query, there would be no data transfer necessary between the compute nodes, and the Oracle database – although built on the shared everything paradigm - would behave like a shared nothing system for this operation.