Table of Contents

Data Warehouse

What’s a data warehouse?

A data warehouse is a large central data repository of current, history and summarised data coming from operational and external sources used primarily for analysis.

Data Warehouses is large historical databases for decision-support that are loaded with new data on a periodic basis.

“Single Version of the truth”

Data warehousing is a component of BI.

This category covers general subject over applications such as Business Intelligence and Decision Support Systems.

The structure of a data warehouse is very different from that of an operational system:

A data warehouse supports on-line analytical processing (OLAP), as against the on-line transaction processing (OLTP) supported by the operational systems.

In the past, DataWarehouse had to do three things:

Data Warehouses were invented first to improve the performance of large queries.

A DataWarehouse must be commutative throughout all this layers. A data set in the source must be retrieved in all layers (Staging Area, DataWarehouse, Data Mart, …)

Why a DataWarehouse ?

De-coupling the data warehouse from the source system makes the entire architecture more flexible in the long-term.

Table in a datawarehouse

A derived table is based on the content of the base, reference, and lookup tables and is the stored result of a query. The aim is to have a set of data in a day level generally in order to analyze a business process. An aggregate table is an aggregation which follow the hierarchy level based on the derived table in order to speed up the performance.

A lookup table is at a high level of a hierarchy. They are tables such as type, category, status, reason, role, group whereas the reference table is a the lowest level of the hierarchy.

Populate the tables in the following order:

Analyze the tables in one category before loading the tables in the next category (for example, analyze the reference tables before loading the lookup tables). Additionally, you can analyze all of the tables loaded by the source-ETL process before executing the intra-ETL processes).

The size of a warehouse

To plan your data warehouse accurately, you should do some experiments by loading some sample data and measure the actual storage size.

Development

System

Slowly Changing

ELT, ETL (Refresh, Load)

Scan (Access)

Storage

Glossary

DataWarehouse Specialized Database

Documentation, Tutorial