The first destination of the data that has been extracted from source is the staging area. Sometimes staging areas are also called landing zones for flat files, XML files, Cobol files and the like.
This logical layer:
It is also possible that in some implementations this layer is not necessary, as all data transformation processing will be done “on the fly” as data is extracted from the source system before it is inserted directly into the Operational data layer (Foundation).
The most basic approach for the staging layer is to have it be an identical schema (a replicate) of the data source:
INDEPENDENT TABLES / FILES that “arrive” when the data is ready on the source.
By having them be independent, (no foreign keys, no referential integrity, no lookups, no matches, no checks of any kind of ) the processes are optimized to scale at high speed by adding no overhead to the data load process.
Truncate and re-load makes the processes completely restartable and fast.
The whole point to staging batch data is:
The staging area may also the layer where rejected data for data type reasons are retained. If we report directly on this layer, we might see data quality problems.