A woman using a keypunch to tabulate the United States Census, circa 1940:
About
Data processing is a more general term for manipulating data whereas data integration is the integration of data between two systems.
Management
Model
Data Integration has roughly two data processing model:
Model | |
---|---|
stream processing (reactive processing) | Send a message to another process, to be handled asynchronously Processing that executes continuously as long as data is being produced |
batch processing | Periodically crunch a large amount of accumulated data Processing that is executed and runs to completeness in a finite amount of time, releasing computing resources when finished |
I/O Pattern
Application | Query | Selectivity | Processing | Data |
---|---|---|---|---|
OLAP | Changing Queries (ad-hoc) | A lot | A lot | Fixed Data |
OLTP | Fixed Queries | Few | Few | Changing Data |
Streaming | Fixed Queries | All | Few | Changing Data |
Batch (Data Warehouse) | Fixed Queries | All | A lot | Fixed Data |
see also I/O - Workload (Access Pattern)
Data Processing Model / Framework
Function
See operations.
Lineage
You can visualize each data transformation step in a lineage report
Others
Goal
- The “360 degree view of the enterprise” is a commonly discussed goal that really means data integration. ??
Term
- ETL : Extraction, Transformation and Load Software
- ELT : Extraction, Load and Transformation Software