Table of Contents

MapReduce - Map (Mapper)

About

The Map implementation in Hadoop in a application

Mapper maps input key/value pairs to a set of intermediate key/value pairs.

Maps are the individual tasks that transform input records into intermediate records.

Implementation

Applications implements this map function and:

The Mapper outputs are:

Management

Number

The Hadoop MapReduce framework spawns one map task for each InputSplit generated by the InputFormat for the job.

The number of maps is usually driven by the total size of the inputs, that is, the total number of blocks of the input files.

Example:

<MATH> \text{number of map} = \frac{10 * 1024 * 1024}{128} = 81 920 </MATH>

The total number of map can be set to be higher with Configuration.set(MRJobConfig.NUM_MAPS, int) (This provides just a hint).

Documentation / Reference