System - Availability

Card Puncher Data Processing


Availability is the (elimination|absence) of downtime.

Availability is when a system still work after a node (computer) failure.

Availability means that you can always read and write to the system.

Availability is a SLI resources metrics that shows the percentage of time that the resource responded to requests.

In the real world, there is no such thing as 100% availability. Highly available systems are defined in terms of 9s (99.9%, 99.99%, …). The more 9s, the better.

No system can guarantee availability.


A high availability requirement implies that the system must replicate data.


Availability is a combination of:

  • how often things are unavailable - MTBF (mean time between failures)
  • and how long they remain that way - MTTR (mean time to recovery).


Visualisation. See desmos calculator

<MATH> f\left(x,y\right)=\frac{y}{y+x}\left\{x\ge0\right\}\left\{y\ge0\right\} </MATH>

Documentation / Reference

Discover More
Testing Infrastructure
Chaos Monkey System

Yury Izrailevsky and Ariel Tseitlin: “The Netflix Simian Army,”, 19 July 2011. ways to improve availability and reliability...
Scale Counter Graph
Counter - Resources Metrics

This resource (counter|metrics) are usually expressed in the following terms: utilization: as a percent over a time interval. eg, “one disk is running at 90% utilization”. saturation: as a queue...
Card Puncher Data Processing
Data Processing - Replication

Replication: Having a copy of the same data on multiple machines (nodes) in order to increase : Feature Example Performance serve reads in parallel, distributing application workloads across multiple...
Data System Architecture
Data Property - BASE (Basically Available)

BASE is a set of properties for systems that prefer in the CAP theorem: Availability (A) and Partition tolerance (P) over Consistency (C) A base system is therefore also known as a AP (Available...
Data System Architecture
Data Warehousing - 34 Kimball Subsytems

This page takes back the Kimball Datawarehouse 34 Subsystem as a table of content and links them to a page on this website....
Cap Theorem Database Type
Distributed Database - CAP Theorem (Consistency, Availability, Partition Tolerance)

This theorem from Eric Brewer in 2000, followed up later by Lynch in 2002 state that a distributed database can't get all these three notions at the same time: consistency - data is the same for every...
Data System Architecture
Distributed System - Network Partition

Network partition in the context of distributed system. For a subnet network partition, see A network partition refers to a network split between nodes due to the failure of network devices. Example:...
Card Puncher Data Processing
Hive - High Availability

The Hive metastore is stateless and thus there can be multiple instances to achieve High Availability. Using hive.metastore.uris it is possible to specify multiple remote metastores. Hive will use the...
Data System Architecture

are distributed database that prioritize consistency over availability - See Spanner (and its Cloud Spanner counterpart), FaunaDB, CockroachDB, YugaByte. In the event of a network partition,...
Card Puncher Data Processing
System Property - Reliability

The system should continue to: work correctly (performing the correct function at the desired performance) even in the face of adversity (hardware or software faults, and even human error). Since...

Share this page:
Follow us:
Task Runner