Table of Contents

About

A typical file in HDFS is gigabytes to terabytes in size.

A file is split into one or more blocks.

Files in HDFS are write-once (except for appends and truncates) and have strictly one writer at any time.

Management

Load/Put

Put a local file in the Hadoop file system with FS Shell

hadoop fs -put /data/bacon.txt /user/demo/food/bacon.txt

Replication

HDFS stores each file as a sequence of blocks. The blocks of a file are replicated for fault tolerance.

See HDFS - Block Replication

Delete

If trash configuration is enabled, files removed by FS Shell is not immediately removed from HDFS.

Name

See INPUT__FILE__NAME (input file's name for a mapper task) built-in Hive column