A client establishes a connection to a configurable TCP port on the NameNode machine. It talks the ClientProtocol with the NameNode.
A Remote Procedure Call (RPC) abstraction wraps both the Client Protocol and the DataNode Protocol.
When a client retrieves file contents it perform a data integrity check on the blocks. If the check is negative, the client can opt to retrieve the replica of that block from another DataNode.
Lazy Persist writes: The Data Nodes will flush in-memory data to disk asynchronously thus removing expensive disk IO and checksum computations. See Memory Storage Support in HDFS
- A C language wrapper for this Java API
- NFS gateway, HDFS can be mounted as part of the client’s local file system.