Data Processing - Replication

Card Puncher Data Processing


Replication: Having a copy of the same data on multiple machines (nodes) in order to increase :

Feature Example
Performance serve reads in parallel, distributing application workloads across multiple databases
Availability keep the systeem running if a machine stops working due to outage, upgrade or maintenance, fault tolerance

See also: Data Integration - Synchronization


Common replication concepts include:

  • master/slave Replication: All write-requests are performed on the master and then replicated to the slave(s), many systems are built that way.
  • quorum: The result of Read and Write requests are calculated by querying a “majority” of replicas
  • multimaster: Two or more replicas sync each other via a transaction identifier

Leader based

  • you send your writes to one designated node (which you may call the leader, master or primary),
  • it’s the leader’s responsibility to ensure that the writes are copied to the other nodes (which you may call followers, slaves or standbys).

Replication at each master and subscriber database is controlled by replication agents that communicate through TCP/IP stream sockets. The replication agent on the master database reads the records from the transaction log for the master database. It forwards changes to replicated elements to the replication agent on the subscriber database. The replication agent on the subscriber then applies the updates to its database. If the subscriber agent is not running when the updates are forwarded by the master, the master retains the updates in its transaction log until they can be applied at the subscriber.

Replication of databases often relates closely to transactions. If a database can log its individual actions, one can create a duplicate of the data in real time. DBAs can use the duplicate to improve performance and/or the availability of the whole database system.

Database clustering

Parallel synchronous replication of databases enables the replication of transactions on multiple servers simultaneously, which provides a method for backup and security as well as data availability. This is commonly referred to as “database clustering”.


  • Conflict-free Replicated Data Types (CRDTs), a class of algorithm that provides strong eventual consistency guarantees for replicated data.

Documentation / Reference

Discover More
Card Puncher Data Processing
Data Integration - Methods / Design Pattern

With multiple applications in your IT infrastructure reading and writing to and from different data stores in varying formats, it is imperative to implement a process that will let you integrate the data...
Card Puncher Data Processing
Data Integration - Synchronization

duplicate of of ? Ensure that all instances of a repository (database, file system, ...) contain the same data. Its not a trivial task when the data is volatile. Replication is the process of copying...
Data System Architecture
Data Management - (Transaction|Request|Commit|Redo) Log

(Transaction|Request|commit) logs are structured log file store all changes made to the data as they occur. They permits the implementation of : transaction isolation undoable operation. recovery...
Data System Architecture
Data Property - Partition tolerance (System Property)

Partition tolerance means that the system continues to work even if nodes can no longer communicate. In the context of the cap theorem, partition tolerance means that the system continue to work even...
Data System Architecture
Database management system (DBMS)

A Database_management_systemDBMS is a set of software programs that controls the organization, storage, management, and retrieval of data. A database is a collection of permanently stored data used by...
Card Puncher Data Processing
Design Pattern - Observer (Publish-subscribe model|Pub Sub)

An observer makes a observation by creating an event and passing it to a callback method of the subscriber. The observer is also called a message broker because it dispatches the event (ie message) to...
Yarn Hortonworks
HDFS - Block Replication

in HDFS HDFS stores each file as a sequence of blocks. The blocks of a file are replicated for fault tolerance. The NameNode makes all decisions regarding replication of blocks. It periodically receives...
Event Centric Thinking
Stream - How to capture change of state - Change Data Capture (CDC)

This page is how to capture the changes of state in order to create a stream. The stream processing will then perform an incremental computation Capturing data changes is far from a trivial task, and...
Card Puncher Data Processing
System - Availability

Availability is the (elimination|absence) of downtime. Availability is when a system still work after a node (computer) failure. Availability means that you can always read and write to the system....
Data System Architecture
Transactions - Write-Ahead Logging (Rollback journal) - WAL

Write-Ahead Logging (WAL) is a rollback journal implementation. This implementation writes change directly to the rollback journal whereas the traditional rollback journal writes changes to the original...

Share this page:
Follow us:
Task Runner