Spark Engine - Shuffle

Spark Query Plan Generation

About

shuffle means moving data rows by rows between partition.

Conf

  • spark.sql.shuffle.partitions - Configures the number of partitions to use when shuffling data for joins or aggregations.





Discover More
Card Puncher Data Processing
Spark DataSet - Partition

org/apache/spark/sql/DataFrameWriterpartitionBy(scala.collection.Seq org/apache/spark/sql/DataFrameWriterpartitionBy(String... colNames) org/apache/spark/sql/DatasetforeachPartition(func) - Runs func...
Spark Query Plan Generation
Spark Engine - Aggregation

Before an aggregation, there is a shuffle taking place
Spark Query Plan Generation
Spark Engine - Join

Before a join, there is a shuffle taking place



Share this page:
Follow us:
Task Runner