HDFS - Hadoop Archive file (har)

Yarn Hortonworks

About

Code Shipping - Archive (Distribution) in the Hadoop context.

Concept

An archive:

  • exposes itself as a hdfs file system layer. All the fs shell commands in the archives work then but with a different uri.
  • is immutable. Rename’s, deletes and creates command return an error.

Management

URI

  • Qualified URI
har://scheme-hostname:port/archivepath/fileinarchive
  • Default file system (host and port)
har:///archivepath/fileinarchive

Create

with Hadoop - hadoop client utility

hadoop archive -archiveName name -p <parent> [-r <replication factor>] <src>* <dest>

Unarchive

Since all the fs shell commands in the archives work transparently, unarchiving is just a matter of copying.

To unarchive:

hdfs dfs -cp har:///user/zoo/foo.har/dir1 hdfs:/user/zoo/newdir
hadoop distcp har:///user/zoo/foo.har/dir1 hdfs:/user/zoo/newdir

Documentation / Reference





Discover More
Yarn Hortonworks
Hadoop - hadoop client utility

CLASSNAME - run the class named CLASSNAME COMMAND is one of: fs run a generic filesystem user client. version print the version jar run a jar file. See =====...
Yarn Ui Log Ambari
Yarn - Log (Container, Application) - Tfile

The log of an application (ie from all the containers that the app use when running). Application logs are not saved in text format. They are saved in a binary format called org/apache/hadoop/io/file/tfile/TFileTFile....



Share this page:
Follow us:
Task Runner