Spark Engine - Transformation Function

Spark Query Plan Generation


Transformations are functions that will not be completed at the time you write and execute the code.

They will only get executed once an action function is called.

Spark transformations create new data structure from an existing one (creating a chain).

Spark remembers the set of transformations that are applied to a base data structure

It can then optimize the required calculations and automatically recover from failures and slow workers.


  • Converting an integer into a float or to filter a set of values.

Discover More
Spark Pipeline
Spark - (RDD) Transformation

transformation function in RDD Transformations Description filter returns a new data set that's formed by selecting those elements of the source on which a function returns true. distinct([numTasks]))...
Spark Query Plan Generation
Spark Engine - (Operations | Functions )

Operations are divided into transformations and actions. Transformations are pipelined function (producing the same input type), and actions trigger computation and return results. Transformation functions...
Spark Query Plan Generation
Spark Engine - Action Function

Actions are function that trigger the computation of all previous transformations in order to get back an actual result. An action is composed of one or more jobs Spark actions execute the specified...

Share this page:
Follow us:
Task Runner