HDFS - File

Yarn Hortonworks

About

A typical file in HDFS is gigabytes to terabytes in size.

A file is split into one or more blocks.

Files in HDFS are write-once (except for appends and truncates) and have strictly one writer at any time.

Management

Load/Put

Put a local file in the Hadoop file system with FS Shell

hadoop fs -put /data/bacon.txt /user/demo/food/bacon.txt

Replication

HDFS stores each file as a sequence of blocks. The blocks of a file are replicated for fault tolerance.

See HDFS - Block Replication

Delete

If trash configuration is enabled, files removed by FS Shell is not immediately removed from HDFS.

Name

See INPUT__FILE__NAME (input file's name for a mapper task) built-in Hive column





Discover More
Yarn Hortonworks
HDFS - DataNode

A dataNode is a HDFS process that manage storage attached to the nodes that they run on. The DataNodes are responsible for serving read and write requests from the file system’s clients. The DataNodes...
Hadoop Hdfs Fsimage
HDFS - FsImage File

The HDFS file system metadata are stored in a file called the FsImage. It contains: the entire file system namespace the mapping of blocks to files and file system properties The FsImage...
Yarn Hortonworks
HDFS - fsck (File System Check)

Runs the HDFS filesystem checking utility for various inconsistencies. Unlike a traditional fsck utility for native file systems, this command does not correct the errors it detects. It will report...
Yarn Hortonworks
Hadoop Distributed File System (HDFS)

Hadoop Distributed File System (HDFS) Hadoop Distributed File System (HDFS) is a clustered file system. See for an architectural overview. * * * * * * * Amazon...



Share this page:
Follow us:
Task Runner