MapReduce - Input Split

Mapreduce Pipeline

About

An input split is a logical representation of the data stored in file blocks.

They represents the data to be processed by an individual Mapper (ie The number of input splits that are calculated for a specific application determines the number of mapper tasks)

A MapReduce job client calculates the input splits to figures out where the first record in a block begins and where the last record in the block ends. Then a RecordReader process it to split them further in a record.

Management

Default

FileSplit is the default InputSplit. It sets mapreduce.map.input.file to the path of the input file for the logical split.





Discover More
Mapreduce Pipeline
MapReduce - InputFormat

The default behavior of file-based InputFormats, typically sub-classes of FileInputFormat, is to split the input into logical InputSplits based on the total size, in bytes, of the input files. InputFormat:...
Mapreduce Pipeline
MapReduce - Map (Mapper)

The Map implementation in Hadoop in a application Mapper maps input key/value pairs to a set of intermediate key/value...
Mapreduce Pipeline
MapReduce - Record

The record reader breaks the data from an input split into key/value pairs for input to the Mapper. The key is...



Share this page:
Follow us:
Task Runner