MapReduce - InputFormat

Mapreduce Pipeline

MapReduce - InputFormat

About

The default behavior of file-based InputFormats, typically sub-classes of FileInputFormat, is to split the input into logical InputSplits based on the total size, in bytes, of the input files.

Example

InputFormat:

  • org.apache.hadoop.mapreduce.lib.input.FileInputFormat - one file at a time
  • TextInputFormat - one line at a time





Discover More
Mapreduce Pipeline
Map Reduce - Data (Stream) - pairs

MapReduce framework types the Writable interface (to be serializable) the WritableComparable interface (to facilitate sorting) pipeline
Mapreduce Pipeline
MapReduce - Map (Mapper)

The Map implementation in Hadoop in a application Mapper maps input key/value pairs to a set of intermediate key/value...
Mapreduce Pipeline
MapReduce - Pipeline

A MapReduce app implements a pipeline where: the input is transformed in key value pair stream/data the stream/data is process in paralleled via a map operations the result is then combined/shuffled...



Share this page:
Follow us:
Task Runner