One of the most challenging aspects of data warehouse administration is the development of ETL (extract, transform, and load) processes that load data from OLTP systems into data warehouse databases. When you load data into a large, partitioned table, you swap the table that contains the data to be loaded with an empty partition in the partitioned table and the data appears in the target table instantaneously together with any indexes associated with it, giving you the ability to prepare, load and index data offline, and then have it instantly appear in the data warehouse.
Other instances in which large amounts of data are moved include:
Partitions may prove to be extremely valuable in these processes.
Take, for instance, the TRANS table and another table—TRANS_SUMMARY—which records the total transaction amount for a particular date. The following describes the TRANS_SUMMARY table:
SQL> desc trans_summary
Name Null? Type
--------- --------- ---------------------
PRODUCT_CODE NUMBER
TOT_AMOUNT NUMBER(12,2)
A traditional approach to populating this data warehouse table every day might be as follows:
insert into trans_summary
select product_code, sum(trans_amount)
from trans
group by product_code;
This approach works but is plagued by some problems:
insert /*+ APPEND NOLOGGING */
into trans_summary
select product_code, sum(trans_amount)
from trans
group by product_code;
Data being loaded into the TRANS_SUMMARY table will remain unavailable, however, until the data load is complete.
The use of partitions eliminates or largely mitigates these two issues. Because the summary table is grouped by PRODUCT_CODE, you can create the TRANS_SUMMARY table list partitioned on the PRODUCT_CODE column:
create table trans_summary (
product_code number,
tot_amount number(12,2)
)
partition by list (product_code)
(
partition p1 values (1),
partition p2 values (2),
partition p3 values (3),
partition p4 values (4),
partition p5 values (5),
partition p6 values (6),
partition p7 values (7),
partition p8 values (8),
partition p9 values (9),
partition p10 values (10)
)
During the loading process, you can load the data from TRANS into TRANS_SUMMARY partition by partition instead of loading the whole table. Here is how you would load the data for PRODUCT_CODE value=1:
create table trans_summary_temp
nologging as
select
cast(1 as number) product_code,
cast(sum(trans_amount) as number(12,2)) tot_amt
from trans
where product_code = 1
group by 1;
Note that the table is created with the NOLOGGING option, which significantly reduces the undo and redo generation. This is a dynamically created table, and while it is being created, the original TRANS and TRANS_SUMMARY tables are fully accessible to users.
alter table trans_summary
exchange partition p1
with table trans_summary_temp
including indexes;
This operation instantly makes the data in the TRANS_SUMMARY_TEMP table available as the data in the p1 partition of the TRANS_SUMMARY table. This “switch” merely updates the pointers in the data dictionary; no data is actually moved. So this process is extremely fast (usually a few seconds), and the table is locked for that period only. The table data is available to users at all other times in the process.
So this approach has four key advantages over loading the table directly: