HDFS - Checkpoint

About

During a checkpoint the changes from the transaction log (Editlog) are applied to the metadata store (FsImage) (because it's not efficient to record each change on the metadata store (FsImage)

Checkpoint process

When the NameNode starts up, or a checkpoint is triggered by a configurable threshold,:

  • it reads the FsImage and EditLog from disk
  • it applies all the transactions from the EditLog to the in-memory representation of the FsImage
  • it flushes out this new version into a new FsImage on disk.
  • It truncates the old EditLog because its transactions have been applied to the persistent FsImage.

Management

Trigger / Run

A checkpoint can be triggered:

  • at a given time interval (dfs.namenode.checkpoint.period) expressed in seconds,
  • or after a given number of filesystem transactions have accumulated (dfs.namenode.checkpoint.txns).

If both of these properties are set, the first threshold to be reached triggers a checkpoint.

From the config file:

<property>
  <name>dfs.namenode.checkpoint.period</name>
  <value>21600</value>
</property>

<property>
  <name>dfs.namenode.checkpoint.txns</name>
  <value>1000000</value>
</property>

or command line:

hdfs getconf -confKey dfs.namenode.checkpoint.period

Location

From the config file:

<property>
  <name>dfs.namenode.checkpoint.dir</name>
  <value>/hadoop/hdfs/namesecondary</value>
</property>

<property>
  <name>dfs.namenode.checkpoint.edits.dir</name>
  <value>${dfs.namenode.checkpoint.dir}</value>
</property>

or command line:

hdfs getconf -confKey dfs.namenode.checkpoint.dir
/hadoop/hdfs/namesecondary

Powered by ComboStrap