HDFS - Hadoop Archive file (har)

1 - About

3 - Concept

An archive:

  • exposes itself as a hdfs file system layer. All the fs shell commands in the archives work then but with a different URI.
  • is immutable. Rename’s, deletes and creates command return an error.

4 - Management

4.1 - URI

  • Qualified URI


  • Default file system (host and port)


4.2 - Create

with Hadoop - hadoop client utility

hadoop archive -archiveName name -p <parent> [-r <replication factor>] <src>* <dest>

4.3 - Unarchive

Since all the fs shell commands in the archives work transparently, unarchiving is just a matter of copying.

To unarchive:

  • sequentially (with HDFS - hdfs command line)

hdfs dfs -cp har:///user/zoo/foo.har/dir1 hdfs:/user/zoo/newdir

hadoop distcp har:///user/zoo/foo.har/dir1 hdfs:/user/zoo/newdir

5 - Documentation / Reference

