Spark DataSet - Bucket

Card Puncher Data Processing


A partition may be divided in bucket.



Buckets the output by the given columns. If specified, the output is laid out on the file system similar to Hive's bucketing scheme.

This is applicable for all file-based data sources (e.g. Parquet, JSON) starting with Spark 2.1.0.

Discover More
Card Puncher Data Processing
Spark DataSet - Partition

org/apache/spark/sql/DataFrameWriterpartitionBy(scala.collection.Seq org/apache/spark/sql/DataFrameWriterpartitionBy(String... colNames) org/apache/spark/sql/DatasetforeachPartition(func) - Runs func...

Share this page:
Follow us:
Task Runner