MapReduce - InputFormat

The default behavior of file-based InputFormats, typically sub-classes of FileInputFormat, is to split the input into logical InputSplits based on the total size, in bytes, of the input files.



  • org.apache.hadoop.mapreduce.lib.input.FileInputFormat - one file at a time
  • TextInputFormat - one line at a time

MapReduce framework types the Writable interface (to be serializable) the WritableComparable interface (to facilitate sorting)
The Map implementation in Hadoop in a application Mapper maps input key/value pairs to a set of intermediate key/value
A MapReduce app implements a pipeline where: the input is transformed in key value pair stream/data the stream/data is process in paralleled via a map operations the result is then combined/shuffled

