HDFS - FsImage File

About

The HDFS file system metadata are stored in a file called the FsImage.

It contains:

  • the entire file system namespace
  • the mapping of blocks to files
  • and file system properties

Management

Location

The FsImage is stored as a file in the NameNode’s local file system.

The location is defined in HDFS - Configuration (hdfs-site.xml). Example:

<property>
	<name>dfs.namenode.name.dir</name>
	<value>file:/hadoop/data/dfs/namenode</value>
</property>

Example:

hdfs getconf -confKey dfs.namenode.name.dir
/hadoop/hdfs/namenode

Modification

Even though it is efficient to read a FsImage, it is not efficient to make incremental edits directly to a FsImage. Instead of modifying FsImage for each edit, the edits are persisted in the Editlog. During the checkpoint the changes from Editlog are applied to the FsImage.

Xml

The Offline Image Viewer (OIV) is a tool to dump the contents of hdfs fsimage files to a human-readable format and provide read-only WebHDFS API in order to allow offline analysis and examination of an Hadoop cluster’s namespace.

Example:

hdfs oiv -p XML -i fsimage_0000000000000307728 -o fsimage.xml

Result: _

Example: fsimage.xml

Download

See the option -fetchImage <local directory> of dfsadmin to download the most recent fsimage from the Name Node and saves it in the specified local directory.

Example:

We can see that the client make a call to the webHdfs Rest API

hdfs dfsadmin -D "fs.default.name=hdfs://headnode/" -fetchImage .
18/04/09 14:37:40 INFO namenode.TransferFsImage: Opening connection to http://hn0.ax.internal.cloudapp.net:30070/imagetransfer?getimage=1&txid=latest
18/04/09 14:37:40 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds
18/04/09 14:37:41 INFO namenode.TransferFsImage: Combined time for fsimage download and fsync to all disks took 0.05s. The fsimage download took 0.05s at 108.70 KB/s. Synchronous (fsync) write to disk of /tmp/./fsimage_0000000000000307728 took 0.00s.

Version

cat FSIMAGE_HOME/current/VERSION
# cat /hadoop/hdfs/namenode/current/VERSION
#Mon Apr 09 08:57:32 UTC 2018
namespaceID=1498378884
clusterID=CID-f09ee152-c799-471f-8849-ebed190b31fe
cTime=0
storageType=NAME_NODE
blockpoolID=BP-272822339-10.10.6.20-1521626942449
layoutVersion=-63

Powered by ComboStrap