Table Function and ETL
Oracle9i introduces a family of new features to support more scalable and efficient ETL (Extraction, Transformation, Load) processing for data warehouses and business intelligence systems. One of the most exciting ETL additions is the Table Functions feature. The table functions feature enables high efficiency execution of ETL functions implemented in PL/SQL, C or Java.
In an ETL process, the data extracted from a source system passes through a sequence of transformations before it is loaded into a data warehouse. Complex transformations are implemented in procedural programs, either outside the database or inside the database in PL/SQL or Java. Prior to Oracle9i, developers of ETL processes faced two challenges:
- Data Staging - When the results of a transformation grow too large to fit into memory, they must be staged: the results are materialized into database tables or flat files. The staged data is then read and processed as input to the next transformation in the sequence. Staging data complicates development and slows ETL processing.
- Transformation Parallelism - Adding parallelism to transformations requires complex programming, reducing developer productivity.
Oracle9i's table functions eliminate the need to stage data for transformations implemented in PL/SQL, C or Java. The data flows between transformations without interruption, a process called “pipelining.” In addition, the table functions enable transparent parallelism for the transformations. The figure below shows a sample transformation flow performed with staging and the same flow performed using the table functions:
In the “transformation with staging” process of the figure, showing ETL prior to Oracle9i, external source data is loaded into the database table “stage 1.” This step exposes the source data to the database for further usage. Then the procedural transformation T1 reads this data, applies the first transformation and stages the intermediate result in table “stage 2.” Transformation T2 then reads the data from stage 2 and inserts it into the target table. Note that any parallel execution of one of those transformations must be coded manually.
In the “pipelined parallel transformation” process of the figure, using Oracle9i's table functions, the two staging tables disappear. The T1 transformation sends data directly to transformation T2 without the complications and delay of staging. In addition, the transformations are processed in parallel without any manual coding, as represented by the multiple transformation symbols for T1 and T2. Note that the transparent access of the external source data is done with the External Table feature introduced in Oracle9i. (To learn about external tables, see the Oracle9i Database Daily Features Archives)
Along with tremendous performance benefits, table functions provide great ETL flexibility. Table functions enable use of :
- C or
seamlessly with SQL. For instance, a transformation can SELECT from a table function or pass results of a SQL query directly into a table function for further processing. With table functions, transformation components can be combined and reused for faster deployment for new or changed data flows. Table functions thus enable you to remodel your transformation process flow quickly and efficiently.
Typical Pipelined Example
This are the typical steps to perform when using PL/SQL Table Functions:
- The producer function must use the PIPELINED keyword in its declaration.
- The producer function must use an OUT parameter that is a record, corresponding to a row in the result set.
- Once each output record is completed, it is sent to the consumer function through the use of the PIPE ROW keyword.
- The producer function must end with a RETURN statement that does not specify any return value.
- The consumer function or SQL statement then must use the TABLE keyword to treat the resulting rows from the PIPELINE function like a regular table.
The first step is to define the format of the rows that are going to be returned. In this case here, we're going to return a INT, DATE followed by a VARCHAR2(25).
CREATE OR REPLACE TYPE myObjectFormat AS OBJECT ( A INT, B DATE, C VARCHAR2(25) ) /
Next a collection type for the type previously defined must be created.
CREATE OR REPLACE TYPE myTableType AS TABLE OF myObjectFormat /
Finally, the producer function is packaged in a package. It is a pipelined function as indicated by the keyword pipelined.
CREATE OR REPLACE PACKAGE myDemoPack AS FUNCTION prodFunc RETURN myTableType PIPELINED; END; / CREATE OR REPLACE PACKAGE BODY myDemoPack AS FUNCTION prodFunc RETURN myTableType PIPELINED IS BEGIN FOR i in 1 .. 5 LOOP PIPE ROW (myObjectFormat(i,SYSDATE+i,'Row '||i)); END LOOP; RETURN; END; END; /
ALTER SESSION SET NLS_DATE_FORMAT='dd.mm.yyyy'; SELECT * FROM TABLE(myDemoPack.prodFunc()); A B C ---------- ---------- --------- 1 31.05.2004 Row 1 2 01.06.2004 Row 2 3 02.06.2004 Row 3 4 03.06.2004 Row 4 5 04.06.2004 Row 5