HDFS - DistCp (distributed inter/intra-cluster copy)
Table of Contents
About
DistCp (distributed copy) is a tool used for large inter/intra-cluster copying
Articles Related
Concept
distcp is a mapReduce application and run therefore in parallel. It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list.
Management
Inter-cluster copy
Hadoop - hadoop client utility
hadoop distcp hdfs://nn1:8020/foo/bar hdfs://nn2:8020/bar/foo
where; nn = HDFS - NameNode
Example
between S3 and Hdfs
hadoop distcp s3n://AWS_SECRET_ID:[email protected]/enron-email hdfs:///tmp/enron