Table of Contents

HDFS - Hadoop Archive file (har)

About

Code Shipping - Archive (Distribution) in the Hadoop context.

Concept

An archive:

Management

URI

har://scheme-hostname:port/archivepath/fileinarchive
har:///archivepath/fileinarchive

Create

with Hadoop

hadoop archive -archiveName name -p <parent> [-r <replication factor>] <src>* <dest>

Unarchive

Since all the fs shell commands in the archives work transparently, unarchiving is just a matter of copying.

To unarchive:

hdfs dfs -cp har:///user/zoo/foo.har/dir1 hdfs:/user/zoo/newdir
hadoop distcp har:///user/zoo/foo.har/dir1 hdfs:/user/zoo/newdir

Documentation / Reference