Spark Engine - Aggregation

Spark Query Plan Generation


Before an aggregation, there is a shuffle taking place

Spark Engine - Shuffle

shuffle means moving data rows by rows between partition. spark.sql.shuffle.partitions - Configures the number of partitions to use when shuffling data for joins or aggregations.

Task Runner