About
One of the most challenging aspects of data warehouse administration is the development of ETL (extract, transform, and load) processes that load data from OLTP systems into data warehouse databases. When you load data into a large, partitioned table, you swap the table that contains the data to be loaded with an empty partition in the partitioned table and the data appears in the target table instantaneously together with any indexes associated with it, giving you the ability to prepare, load and index data offline, and then have it instantly appear in the data warehouse.
Other instances in which large amounts of data are moved include:
- replicating data from production to QA databases,
- populating data marts from main data warehouses and vice versa,
- and populating summary tables from transaction tables.
Partitions may prove to be extremely valuable in these processes.
Articles Related
Example
Take, for instance, the TRANS table and another table—TRANS_SUMMARY—which records the total transaction amount for a particular date. The following describes the TRANS_SUMMARY table:
SQL> desc trans_summary
Name Null? Type
--------- --------- ---------------------
PRODUCT_CODE NUMBER
TOT_AMOUNT NUMBER(12,2)
Traditional approach
Definition
A traditional approach to populating this data warehouse table every day might be as follows:
insert into trans_summary
select product_code, sum(trans_amount)
from trans
group by product_code;
Problems
This approach works but is plagued by some problems:
- The INSERT statement will generate a large amount of undo and redo, which will affect the performance of the database. You can reduce the undo and redo generation by introducing multiple commits in a loop while loading data, but the overall elapsed time will increase significantly. You can use the direct path INSERT (using the APPEND hint) with NOLOGGING, as follows:
insert /*+ APPEND NOLOGGING */
into trans_summary
select product_code, sum(trans_amount)
from trans
group by product_code;
Data being loaded into the TRANS_SUMMARY table will remain unavailable, however, until the data load is complete.
- A daily process will have to wipe the TRANS_SUMMARY table clean before loading fresh data; otherwise, it will load duplicates. This table is unavailable for general use from the time it is truncated until it is finally loaded. On some very large systems—depending on the complexity of the query, the size of the table, and the general load on both the source and target databases—this process can take hours, during which the users cannot see even the old data (which has been deleted prior to loading). If the INSERT statement fails, due to lack of space or data errors, the users will have to wait until the new data is loaded, which again can be hours.
Partition approach
Definition
The use of partitions eliminates or largely mitigates these two issues. Because the summary table is grouped by PRODUCT_CODE, you can create the TRANS_SUMMARY table list partitioned on the PRODUCT_CODE column:
create table trans_summary (
product_code number,
tot_amount number(12,2)
)
partition by list (product_code)
(
partition p1 values (1),
partition p2 values (2),
partition p3 values (3),
partition p4 values (4),
partition p5 values (5),
partition p6 values (6),
partition p7 values (7),
partition p8 values (8),
partition p9 values (9),
partition p10 values (10)
)
During the loading process, you can load the data from TRANS into TRANS_SUMMARY partition by partition instead of loading the whole table. Here is how you would load the data for PRODUCT_CODE value=1:
- First create a temporary table whose structure is identical to that of the TRANS table except that it's not partitioned:
create table trans_summary_temp
nologging as
select
cast(1 as number) product_code,
cast(sum(trans_amount) as number(12,2)) tot_amt
from trans
where product_code = 1
group by 1;
Note that the table is created with the NOLOGGING option, which significantly reduces the undo and redo generation. This is a dynamically created table, and while it is being created, the original TRANS and TRANS_SUMMARY tables are fully accessible to users.
- Once the table has been created, you can exchange the TRANS_SUMMARY table with the p1 partition:
alter table trans_summary
exchange partition p1
with table trans_summary_temp
including indexes;
This operation instantly makes the data in the TRANS_SUMMARY_TEMP table available as the data in the p1 partition of the TRANS_SUMMARY table. This “switch” merely updates the pointers in the data dictionary; no data is actually moved. So this process is extremely fast (usually a few seconds), and the table is locked for that period only. The table data is available to users at all other times in the process.
Key advantages
So this approach has four key advantages over loading the table directly:
- The performance impact is low.
- The main table is always available (except for a few seconds).
- In case of data-loading failure, the old data is available.
- You can refresh data selectively.