About
Code Shipping - Archive (Distribution) in the Hadoop context.
Articles Related
Concept
An archive:
- exposes itself as a hdfs file system layer. All the fs shell commands in the archives work then but with a different uri.
- is immutable. Rename’s, deletes and creates command return an error.
Management
URI
- Qualified URI
har://scheme-hostname:port/archivepath/fileinarchive
- Default file system (host and port)
har:///archivepath/fileinarchive
Create
with Hadoop
hadoop archive -archiveName name -p <parent> [-r <replication factor>] <src>* <dest>
Unarchive
Since all the fs shell commands in the archives work transparently, unarchiving is just a matter of copying.
To unarchive:
- sequentially (with HDFS - hdfs command line)
hdfs dfs -cp har:///user/zoo/foo.har/dir1 hdfs:/user/zoo/newdir
- in parallel, with DistCp:
hadoop distcp har:///user/zoo/foo.har/dir1 hdfs:/user/zoo/newdir