HDFS - High Availibilty


The metadata files (FsImage and EditLog) are central data structures of HDFS.

A corruption of these files can cause the HDFS instance to be non-functional.


In a typical HA cluster:

  • two separate machines are configured as NameNodes.
  • At any point in time, exactly one of the NameNodes is in an Active state, and the other is in a Standby state.
  • The Active NameNode is responsible for all client operations in the cluster,
  • The Standby is simply acting as a slave, maintaining enough state to provide a fast failover if necessary.

In order for the Standby node to keep its state synchronized with the Active node, both nodes communicate with a group of separate daemons called JournalNodes

  • The NameNode can be configured to support maintaining multiple copies of the FsImage and EditLog. Any update to either the FsImage or EditLog causes each of the FsImages and EditLogs to get updated synchronously.
  • Multiple NameNodes either with:

