Table of Contents

pySpark

pySpark provides shared variables in two different types.

Broadcast

Broadcast variables are an efficient way of sending data once that would otherwise be sent multiple times automatically in closures.

Accumulator

Accumulators can only be written by workers and read by the driver program.

They allow us to aggregate values from workers back to the driver.