A woman using a keypunch to tabulate the United States Census, circa 1940:
Data processing is a more general term for manipulating data whereas data integration is the integration of data between two systems.
Data Integration has roughly two data processing model:
Model | |
---|---|
stream processing (reactive processing) | Send a message to another process, to be handled asynchronously Processing that executes continuously as long as data is being produced |
batch processing | Periodically crunch a large amount of accumulated data Processing that is executed and runs to completeness in a finite amount of time, releasing computing resources when finished |
Application | Query | Selectivity | Processing | Data |
---|---|---|---|---|
OLAP | Changing Queries (ad-hoc) | A lot | A lot | Fixed Data |
OLTP | Fixed Queries | Few | Few | Changing Data |
Streaming | Fixed Queries | All | Few | Changing Data |
Batch (Data Warehouse) | Fixed Queries | All | A lot | Fixed Data |
see also I/O - Workload (Access Pattern)
See operations.
You can visualize each data transformation step in a lineage report