Data Processing - (Batch|Bulk) Processing

Card Puncher Data Processing

About

An batch processing systems (bulk,offline) means:

  • starting a process,
  • reading a lot of data in batch (in parallel if possible)
  • and terminating the process

Article

Implementation

Simple code iterates generally one tuple at a time (for example looping over rows in a table). This kind of algorithms are hard to optimize and parallelize compared to declarative set-oriented languages such as SQL.

Batch vs Stream processing

See Stream vs Batch





Discover More
Process States
Background process (daemon)

Background process have generally a low priority The most known background process are services.
Batch
Data Processing - Batch

The batch semantics means grouping data into batch in order to manipulate a lot of data at once as opposed to read and process each unit of data. The data is stored in a container: for disk...
Diagram Of Lambda Architecture Generic
Data Processing - Lambda Architecture (batch and stream processing)

nathanmarzNathan Marz wrote a blog post describing the Lambda Architecture: How to beat the CAP theorem Lambda architecture is a data-processing architecture designed to handle massive quantities of...
Owb Mapping Configuration
OWB - Mapping Configuration

What means the different values configuration of a mapping. Properties Description Deployable If you disable it, scripts for that mapping are not generated. Language Warehouse Builder sets...
Card Puncher Data Processing
Oracle Database

Documentation about the Oracle database
Card Puncher Data Processing
Oracle Database - Database Writer Process (DBWn)

Database writer process (DBWn) is a background process that writes buffers in the database buffer cache to data files. Modified or new data is not necessarily written to a datafile immediately. To reduce...
Card Puncher Data Processing
Oracle Database - Direct (path insert|load) (/*+ APPEND */)

A direct-path insert is also known as: direct load A direct-path insert is a bulk operation which will only bypass redo log generation in three cases : the database is in NOARCHIVELOG mode database...
Card Puncher Data Processing
Oracle Database - How can we load a large number of rows into an indexed existing table ?

If you have to perform data load into a or data mart, you must skip the index to minimize the generation of redo log, such as : set the indexes to the state. They are not dropped but just setting...
Card Puncher Data Processing
Oracle Database - NOLOGGING

A table of a statement marked as NOLOGGING will bypass the generation. direct path insert The operation will not be logged and then you prevent the redo log to be generated. It's not in the...
Card Puncher Data Processing
Oracle Database - Redo Size statistics

The redo size statistics shows how much redo log your statement generated when executed. The redo size statistics is most useful when judging the efficiency of large bulk operations with statements...



Share this page:
Follow us:
Task Runner