Data Processing - (Batch|Bulk) Processing

An batch processing systems (bulk,offline) means:

  • starting a process,
  • reading a lot of data in batch (in parallel if possible)
  • and terminating the process



Simple code iterates generally one tuple at a time (for example looping over rows in a table). This kind of algorithms are hard to optimize and parallelize compared to declarative set-oriented languages such as SQL.

Batch vs Stream processing

See Stream vs Batch

Oracle Database

Documentation about the Oracle database
Oracle Database - Database Writer Process (DBWn)

Database writer process (DBWn) is a background process that writes buffers in the database buffer cache to data files. Modified or new data is not necessarily written to a datafile immediately. To reduce...
Oracle Database - Direct (path insert|load) (/*+ APPEND */)

A direct-path insert is also known as: direct load A direct-path insert is a bulk operation which will only bypass redo log generation in three cases : the database is in NOARCHIVELOG mode database...
Oracle Database - How can we load a large number of rows into an indexed existing table ?

If you have to perform data load into a or data mart, you must skip the index to minimize the generation of redo log, such as : set the indexes to the state. They are not dropped but just setting...
Oracle Database - NOLOGGING

A table of a statement marked as NOLOGGING will bypass the generation. direct path insert The operation will not be logged and then you prevent the redo log to be generated. It's not in the...
Oracle Database - Redo Size statistics

The redo size statistics shows how much redo log your statement generated when executed. The redo size statistics is most useful when judging the efficiency of large bulk operations with statements...

