(Data Processing|Data Integration)


Data processing is a more general term for manipulating data whereas data integration is the integration of data between two systems.



Data Integration has roughly two data processing model:

stream processing Send a message to another process, to be handled asynchronously
Processing that executes continuously as long as data is being produced
batch processing Periodically crunch a large amount of accumulated data
Processing that is executed and runs to completeness in a finite amount of time, releasing computing resources when finished

I/O Pattern

Application Query Selectivity Processing Data
OLAP Changing Queries (ad-hoc) A lot A lot Fixed Data
OLTP Fixed Queries Few Few Changing Data
Streaming Fixed Queries All Few Changing Data
Batch (Data Warehouse) Fixed Queries All A lot Fixed Data

see also I/O - Workload (Access Pattern)

Data Processing Model / Framework


See operations.

A woman using a keypunch to tabulate the United States Census, circa 1940:

Card Puncher Data Processing


You can visualize each data transformation step in a lineage report



  • The “360 degree view of the enterprise” is a commonly discussed goal that really means data integration. ??


  • ETL : Extraction, Transformation and Load Software
  • ELT : Extraction, Load and Transformation Software

Magic Quadrant

Etl Software

Documentation / Reference

Task Runner