PySpark - Closure

About

Spark automatically creates closures:

for functions that run on RDDs at workers,
and for any global variables that are used by those workers.

One closure is send per worker for every task.

closures are one way from the driver to the worker.

worker gets code passed via a closure.

When you perform transformations and actions that use functions, Spark will automatically push a closure containing that function to the workers so that it can run at the workers.

About

Articles Related