Table of Contents
pySpark provides shared variables
in two different types.
Broadcast variables are an efficient way of sending data once that would otherwise be sent multiple times automatically in closures.
Accumulators can only be written by workers and read by the driver program.
They allow us to aggregate values from workers
back to the driver.