Table of Contents

Spark - (RDD) Transformation

About

transformation function in RDD

List

Transformations Description
filter returns a new data set that's formed by selecting those elements of the source on which a function returns true.
distinct([numTasks])) returns a new data set that contains the distinct elements of the source data set.
map and flatMap returns a new distributed data set that's formed by passing each element of the source through a function.
zip (optionally with index or id) returning key-value pairs of the n element of each RDD: <math>\forall i\in \{0, \dots, N\} (rdd1_i,rdd2_i)</math>
split split data set
pipe