About
filter(func) returns a new data set (RDD) that's formed by selecting those elements of the source on which the function returns true.
Articles Related
Example
Modulo
rdd.filter(lambda x:x % 2 == 0)
[1,2,3,4] → [2,4]
text
lines = sc.textFile("...",4)
comments = lines.filter(isComment)
# where isComment is a funcion that return a boolean