- foreachPartition(func) - Runs func on each partition of this Dataset.
- mapPartitions - Returns a new Dataset that contains the result of applying func to each partition. Experimental
- sortWithinPartitions - Returns a new Dataset with each partition sorted by the given expressions. This is the same operation as “SORT BY” in SQL (Hive QL).
// coalesce the data to a single file.
RepartitionRepartition creates partitions based on the user’s input by performing a full shuffle on the data (after read)
- repartition by column (takes the number of partition from spark.sql.shuffle.partitions)
- repartitionByRange - range partition
Number of dynamic partitions created is XXXX, which is more than 1000.
The below configuration must be set before starting the spark application
Number of dynamic partitions created is 2100, which is more than 1000.
To solve this try to set hive.exec.max.dynamic.partitions to at least 2100.;
A set with Spark SQL Server will not work. You need to set the configuration at the start of the server.from https://github.com/apache/spark/pull/18769
Documentation / Reference