MapReduce - Input Split

Mapreduce Pipeline


An input split is a logical representation of the data stored in file blocks.

They represents the data to be processed by an individual Mapper (ie The number of input splits that are calculated for a specific application determines the number of mapper tasks)

A MapReduce job client calculates the input splits to figures out where the first record in a block begins and where the last record in the block ends. Then a RecordReader process it to split them further in a record.



FileSplit is the default InputSplit. It sets to the path of the input file for the logical split.

