HDFS - Block Replication

Yarn Hortonworks

About

Data Processing - Replication in HDFS

HDFS stores each file as a sequence of blocks. The blocks of a file are replicated for fault tolerance.

The NameNode makes all decisions regarding replication of blocks. It periodically receives a Blockreport from each of the DataNodes in the cluster. A Blockreport contains a list of all blocks on a DataNode.

DataNode death may cause the replication factor of some blocks to fall below their specified value.

The NameNode constantly tracks which blocks need to be replicated and initiates replication whenever necessary.

State

A block is considered safely replicated when the minimum number of replicas of that data block has checked in with the NameNode through a blockreport.

Algorithm

The section below describes an idealistic situation.

The algorithm may be influenced by the Storage Types and Storage Policies. The NameNode will take the policy into account for replica placement in addition to the rack awareness.

Replication of data blocks does not occur when the NameNode is in the Safemode state.

Replication factor = 3

When the replication factor is three, HDFS’s placement policy is to put:

  • one replica:
    • on the local machine if the writer is on a datanode,
      • otherwise on a random datanode,
  • another replica on a node in a different (remote) rack,
  • and the last on a different node in the same remote rack.

This policy cuts the inter-rack write traffic which generally improves write performance. The chance of rack failure is far less than that of node failure.

Replication factor > 3

If the replication factor is greater than 3, the placement of the 4th and following replicas are determined randomly while keeping the number of replicas per rack below the upper limit (which is basically (replicas - 1) / racks + 2).

Management

Block Size

The block size is configurable per file.

Replication Factor

The common case is 3.

An application can specify the number of replicas of a file. The replication factor can be specified at file creation time and can be changed later.

Because the NameNode does not allow DataNodes to have multiple replicas of the same block, maximum number of replicas created is the total number of DataNodes at that time.

Default value is in HDFS - Configuration (hdfs-site.xml)

<property>
	<name>dfs.replication</name>
	<value>1</value>
</property>

Decrease

When the replication factor of a file is reduced, the NameNode selects excess replicas that can be deleted. The next Heartbeat transfers this information to the DataNode. The DataNode then removes the corresponding blocks and the corresponding free space appears in the cluster. Once again, there might be a time delay between the completion of the setReplication API call and the appearance of free space in the cluster.

Move

See HDFS - Block

Under-replicated Block

See Under-replicated Block

Pipeline

During a replication, the data is pipelined from one DataNode to the next.

Process:

  • A client is writing data to an HDFS file with a replication factor of three
  • The NameNode retrieves the list of DataNodes using a replication target choosing algorithm. This list contains the DataNodes that will host a replica of that block.
  • The client writes data to the first DataNode.
  • The first DataNode starts receiving the data in portions, writes each portion to its local repository and transfers that portion to the second DataNode in the list.
  • The second DataNode, in turn starts receiving each portion of the data block, writes that portion to its repository and then flushes that portion to the third DataNode.
  • Finally, the third DataNode writes the data to its local repository.

A DataNode can be receiving data from the previous one in the pipeline and at the same time forwarding data to the next one in the pipeline.





Discover More
Azure Storage Structure
Azure - Windows Azure Storage Blob (WASB) - HDFS

Windows Azure Storage Blob (WASB) is an file system implemented as an extension built on top of the HDFS APIs and is in many ways HDFS. The WASB variation uses: SSL certificates for improved security...
Hdfs Ui Block Information
HDFS - Block

in HDFS. The block size can be changed by file. Block are stored on a datanode and are grouped in block pool The location on where the blocks are stored is defined in hdfs-site.xml....
Yarn Hortonworks
HDFS - Blockreport

A blockreport is a list of all HDFS data blocks that correspond to each of the local files, and sends this report to the NameNode. Each datanode create and send this report to the namenode: when the...
Yarn Hortonworks
HDFS - File

A typical file in HDFS is gigabytes to terabytes in size. A file is split into one or more blocks. Files in HDFS are write-once (except for appends and truncates) and have strictly one writer at any...
Yarn Hortonworks
HDFS - NameNode

NameNode is an HDFS daemon that run on the head node. It' s the head process of the cluster that manages: the file system namespace and regulates access to files by clients. The NameNode: executes...



Share this page:
Follow us:
Task Runner