Table of Contents

About

filter(func) returns a new data set (RDD) that's formed by selecting those elements of the source on which the function returns true.

Example

Modulo

rdd.filter(lambda x:x % 2 == 0)
[1,2,3,4] → [2,4] 

text

lines = sc.textFile("...",4)
comments = lines.filter(isComment)
# where isComment is a funcion that return a boolean