Data Warehousing - Data Loading (Fast Refresh)


Fast Refresh refer to the better way to quickly load data modifications of a data source.

Several techniques exist:

In fact, the load process is often the primary consideration in choosing the partitioning scheme of data warehouse tables and indexes. In very common scenario, the data warehouse is being loaded by time. You should them partition by time first.

Bulk loading data rates

Bulk loading data rates are governed by the following operations and hardware resources:

  • How fast can the data be read
  • How fast can data be written out
  • How much CPU power is available

Structural Database

  • Little number of rows per block
  • Too many indexes

Flat File loading

The database operations can be simplified to:

  • Read lines from flat files
  • Process lines into columns, internal data types and optionally compress
  • Write rows/columns out to disk

In most cases a reasonably size system becomes CPU bound, not write bound, on data loads as almost all Oracle data warehouses use compression which increases CPU consumption but reduces the IO requirement for the writes. Or putting it another way: Bulk loading into a compressed table should be a CPU bound operation, not a disk (write) bound operation.

Documentation / Reference

Powered by ComboStrap