HDFS - File System Metadata

Yarn Hortonworks


The file system metadata section of HDFS.

The NameNode is the repository of all HDFS metadata.



The metadata are stored in two files:

  • fsimage file which is the metadata store
  • EditLog transaction log file which records every metadata transaction

The metadata files (FsImage and EditLog) are central data structures of HDFS. A corruption of these files can cause the HDFS instance to be non-functional. See HDFS - High Availibilty


The DataNode stores HDFS data in files in its local file system. The DataNode has no knowledge about HDFS files. It stores each block of HDFS data in a separate file in its local file system. The DataNode does not create all files in the same directory. Instead, it uses a heuristic to determine the optimal number of files per directory and creates subdirectories appropriately. It is not optimal to create all local files in the same directory because the local file system might not be able to efficiently support a huge number of files in a single directory. When a DataNode starts up, it scans through its local file system, generates a list of all HDFS data blocks that correspond to each of these local files, and sends this report to the NameNode. The report is called the Blockreport.


Extended attributes (abbreviated as xattrs) are a filesystem feature that allow user applications to associate additional metadata with a file or directory.

Discover More
Yarn Hortonworks
HDFS - Cluster

An HDFS cluster consists of: a single NameNode (the head node) managing the file system. The NameNode is the arbitrator and repository for all HDFS metadata. a number of DataNodes, usually one per...
Yarn Hortonworks
HDFS - EditLog (transaction log)

The NameNode uses a transaction log called the EditLog to persistently record every change that occurs to file system metadata. The NameNode to insert a record into the EditLog when a new file...
Hadoop Hdfs Fsimage
HDFS - FsImage File

The HDFS file system metadata are stored in a file called the FsImage. It contains: the entire file system namespace the mapping of blocks to files and file system properties The FsImage...
Yarn Hortonworks
HDFS - NameNode

NameNode is an HDFS daemon that run on the head node. It' s the head process of the cluster that manages: the file system namespace and regulates access to files by clients. The NameNode: executes...
Yarn Hortonworks
Hadoop Distributed File System (HDFS)

Hadoop Distributed File System (HDFS) Hadoop Distributed File System (HDFS) is a clustered file system. See for an architectural overview. * * * * * * * Amazon...

Share this page:
Follow us:
Task Runner