# Spark - (Map|flatMap)

The map implementation in Spark of map reduce.

• map(func) returns a new distributed data set that's formed by passing each element of the source through a function.
• flatMap(func) similar to map but flatten a collection object to a sequence.

## Example

### Multiplication

``````rdd = sc.parallelize([1,2,3,4])
rdd.map(lambda x: x * 2).collect()```
```
``````[2,4,6,8]
```
```
• def function
``````rdd = sc.parallelize([1,2,3,4])
def plus5(x):
return x+5
rdd.map(plus5).collect()```
```
``````[6, 7, 8, 9]
```
```

### flatMap

#### On 1 list

``````rdd = sc.parallelize([1,2,3])
rdd.flatMap(lambda x:[x,x+5]).collect()```
```
``````[1,6,2,7,3,8]
```
```

#### On a list of list

````sc.parallelize([[1,2],[3,4]]).flatMap(lambda x:x).collect()`
```
``````[1, 2, 3, 4]
```
```

### Key Value

key value (tuple) transformation are supported

``````from operator import add
wordCounts = sc.parallelize([('rat', 2), ('elephant', 1), ('cat', 2)])
totalCount = (wordCounts
.map(lambda (x,y): y)
average = totalCount / float(len(wordCounts.collect()))
print totalCount
print round(average, 2)```
```
``````5
1.67
```
```

Discover More
RDD - Calling a Worker (Local|External) Process

How to call a (forked) external process from Spark Example with a
Spark - (RDD) Transformation

transformation function in RDD Transformations Description filter returns a new data set that's formed by selecting those elements of the source on which a function returns true. distinct([numTasks]))...
Spark RDD - String

Add the line number as a value of a tuple ? where: