What is Data Processing (Data Integration)?

Card Puncher Data Processing

A woman using a keypunch to tabulate the United States Census, circa 1940:


Data processing is a more general term for manipulating data whereas data integration is the integration of data between two systems.



Data Integration has roughly two data processing model:

stream processing (reactive processing) Send a message to another process, to be handled asynchronously
Processing that executes continuously as long as data is being produced
batch processing Periodically crunch a large amount of accumulated data
Processing that is executed and runs to completeness in a finite amount of time, releasing computing resources when finished

I/O Pattern

Application Query Selectivity Processing Data
OLAP Changing Queries (ad-hoc) A lot A lot Fixed Data
OLTP Fixed Queries Few Few Changing Data
Streaming Fixed Queries All Few Changing Data
Batch (Data Warehouse) Fixed Queries All A lot Fixed Data

see also I/O - Workload (Access Pattern)

Data Processing Model / Framework


See operations.


You can visualize each data transformation step in a lineage report



  • The “360 degree view of the enterprise” is a commonly discussed goal that really means data integration. ??


  • ETL : Extraction, Transformation and Load Software
  • ELT : Extraction, Load and Transformation Software

Magic Quadrant

Etl Software


Discover More
Data System Architecture
Data - Raw Data

Raw data refers to a collection of outputs from devices that are unprocessed by a third application. The interpretation depends on the use. Raw data means data unprocessed. See Raw data
Card Puncher Data Processing
Data Integration Tool (ETL/ELT)

This section regroups software's and frameworks supporting data integration in a batch or stream fashion. Oracle Warehouse Builder SQL Server Integration Services ...
P Value Pipeline
Data Mining - (Life cycle|Project|Data Pipeline)

Data mining is an experimental science. Data mining reveals correlation, not causation. With good data, you will make good algorithm. The most preferable solution is then to work on good features....
Card Puncher Data Processing
Data Processing - Reactive Stream Processing

Reactive Streams is the standard/specification for reactive data processing (ie observer, asynchronous processing) The characteristics are: functional programming fashion non-blocking backpressure...
Card Puncher Data Processing
Data Processing - Data Flow (ETL | Workflow | Pipeline)

A data flow is a workflow specialized for data processing Any system where the data moves between code units and triggers execution of the code could be called dataflow Dataflow_architecturecomputer...
Card Puncher Data Processing
Data Processing - Transformation Rule

Transformation rule are rules that are applied to data during data processing. They are implemented via data processing operations.
Data System Architecture
Data Property - Data Consistency - (Strong|Atomic) consistency - Linearizability)

In its most basic form, consistency refers to data values in one data set being consistent with values in another data set at the same point in time. In an other form, consistency, also known as atomic...
Dataquality Metrics
Data Quality - Entity (Resolution|Disambiguation) - Record (linkage|matching) - Conflation

Entity Resolution, or Record linkage is the process of (joining|matching) records from one data source with another that describe the same Entity. Also known as : entity disambiguation/linking, ...
Card Puncher Data Processing
Data Validation (Schema Validation)

Data Validation is: the first step in a data processing lifecycle but is also helpful in spam bot protection. To be able to validate, a schema must be available for the data been validated. The...
Data System Architecture
Data Warehousing - 34 Kimball Subsytems

This page takes back the Kimball Datawarehouse 34 Subsystem as a table of content and links them to a page on this website....

Share this page:
Follow us:
Task Runner