Distributed Database - CAP Theorem (Consistency, Availability, Partition Tolerance)

Data System Architecture

About

This theorem from Eric Brewer in 2000, followed up later by Lynch in 2002 state that a distributed database can't get all these three notions at the same time:

You have to choose two or sacrifice performance.

If two node of the system cannot talk to each other, can they make forward progress on their own?

  • If not, you sacrifice Availability
  • If so, you might have to sacrifice Consistency.

Cap Theorem Database Type

If you update a value on node A there are 2 options to propagate the change to B after a network failure:

  • asynchronously: The update will have to wait and B will be in a inconsistent state
  • synchronously: The entire system will not be available until B receives the update and that's can take a lot of time.

The only way to obtain all notions is to run the system on a single node and this is no more a distributed system.

We as a field must stop trying to groom unbounded datasets into finite pools of information that eventually become complete, and instead live and breathe under the assumption that we will never know if or when we have seen all of our data, only that new data will arrive, old data may be retracted, and the only way to make this problem tractable is via principled abstractions that allow the practitioner the choice of appropriate tradeoffs along the axes of interest:

  • correctness,
  • latency,
  • and cost

The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing

Type

As you can't have all three properties at the same time, the systems need to choose two.

As AC refers to traditional database, the choice is really between consistency versus availability in case of a network partition or failure. When choosing:

  • CP, the system will return an error or a time-out if particular information cannot be guaranteed to be up to date due to network partitioning.
  • AP, the system will always process the query and try to return the most recent available version of the information, even if it cannot guarantee it is up to date due to network partitioning.

Documentation / Reference





Discover More
Data Modeling Chebotko Logical
Cassandra NoSql Database

Cassandra is a NoSql database for transactional workloads that require high scale and maximum availability. Cassandra is suited for transactional workloads at high volume and shouldn’t be considered...
Diagram Of Lambda Architecture Generic
Data Processing - Lambda Architecture (batch and stream processing)

nathanmarzNathan Marz wrote a blog post describing the Lambda Architecture: How to beat the CAP theorem Lambda architecture is a data-processing architecture designed to handle massive quantities of...
Data System Architecture
Data Properties and Transactions

This section is : transactions are the mechanism that guarantee data ACID (atomicity, consistency, isolation, durability) properties during concurrent data changes (ie multiple user or program...
Data System Architecture
Data Property - ACID (atomicity, consistency, isolation, durability)

In computer science, ACID (atomicity, consistency, isolation, durability) is a set of properties that guarantee transactions are processed reliably. They defines a concurrency model. Traditional database...
Data System Architecture
Data Property - BASE (Basically Available)

BASE is a set of properties for systems that prefer in the CAP theorem: Availability (A) and Partition tolerance (P) over Consistency (C) A base system is therefore also known as a AP (Available...
Data System Architecture
Data Property - Partition tolerance (System Property)

Partition tolerance means that the system continues to work even if nodes can no longer communicate. In the context of the cap theorem, partition tolerance means that the system continue to work even...
Data System Architecture
Database management system (DBMS)

A Database_management_systemDBMS is a set of software programs that controls the organization, storage, management, and retrieval of data. A database is a collection of permanently stored data used by...
Data System Architecture
Distributed - Database / Application

Distributed systems is the opposite of a single node (ie computer). Scale across the system bus before you scale across the network with distributed systems. It is all the same design principles. Think...
Data System Architecture
Distributed System - Network Partition

Network partition in the context of distributed system. For a subnet network partition, see A network partition refers to a network split between nodes due to the failure of network devices. Example:...
Data System Architecture
NewSQL

are distributed database that prioritize consistency over availability - See Spanner (and its Cloud Spanner counterpart), FaunaDB, CockroachDB, YugaByte. In the event of a network partition,...



Share this page:
Follow us:
Task Runner