(Data Processing|Data Integration)

About

Data processing is a more general term for manipulating data whereas data integration is the integration of data between two systems.

Management

Model

Data Integration has roughly two data processing model:

Model
stream processing Send a message to another process, to be handled asynchronously
Processing that executes continuously as long as data is being produced
batch processing Periodically crunch a large amount of accumulated data
Processing that is executed and runs to completeness in a finite amount of time, releasing computing resources when finished

I/O Pattern

Application Query Selectivity Processing Data
OLAP Changing Queries (ad-hoc) A lot A lot Fixed Data
OLTP Fixed Queries Few Few Changing Data
Streaming Fixed Queries All Few Changing Data
Batch (Data Warehouse) Fixed Queries All A lot Fixed Data

see also I/O - Workload (Access Pattern)

Data Processing Model / Framework

Function

See operations.

A woman using a keypunch to tabulate the United States Census, circa 1940:

Lineage

You can visualize each data transformation step in a lineage report

Others

Goal

  • The “360 degree view of the enterprise” is a commonly discussed goal that really means data integration. ??

Term

  • ETL : Extraction, Transformation and Load Software
  • ELT : Extraction, Load and Transformation Software

Magic Quadrant

Documentation / Reference


Powered by ComboStrap