Table of Contents

About

An input split is a logical representation of the data stored in file blocks.

They represents the data to be processed by an individual Mapper (ie The number of input splits that are calculated for a specific application determines the number of mapper tasks)

A MapReduce job client calculates the input splits to figures out where the first record in a block begins and where the last record in the block ends. Then a RecordReader process it to split them further in a record.

Management

Default

FileSplit is the default InputSplit. It sets mapreduce.map.input.file to the path of the input file for the logical split.