Table of Contents

Distributed Database - CAP Theorem (Consistency, Availability, Partition Tolerance)

About

This theorem from Eric Brewer in 2000, followed up later by Lynch in 2002 state that a distributed database can't get all these three notions at the same time:

You have to choose two or sacrifice performance.

If two node of the system cannot talk to each other, can they make forward progress on their own?

Cap Theorem Database Type

If you update a value on node A there are 2 options to propagate the change to B after a network failure:

The only way to obtain all notions is to run the system on a single node and this is no more a distributed system.

See also Data Property - ACID (atomicity, consistency, isolation, durability) for conventional database

We as a field must stop trying to groom unbounded datasets into finite pools of information that eventually become complete, and instead live and breathe under the assumption that we will never know if or when we have seen all of our data, only that new data will arrive, old data may be retracted, and the only way to make this problem tractable is via principled abstractions that allow the practitioner the choice of appropriate tradeoffs along the axes of interest:

  • correctness,
  • latency,
  • and cost

The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing

Type

As you can't have all three properties at the same time, the systems need to choose two.

As AC refers to traditional database, the choice is really between consistency versus availability in case of a network partition or failure. When choosing:

Documentation / Reference