Spark DataSet - DSL Operations

Card Puncher Data Processing


Domain-specific-language (DSL) functions are defined in the class:


  • group by,
  • order,
  • plus,….



With a spark session and a dataset of row (ie Spark DataSet - Data Frame)

  • Scala
val people ="...")
val department ="...")

people.filter("age > 30")
 .join(department, people("deptId") === department("id"))
 .groupBy(department("name"), people("gender"))
 .agg(avg(people("salary")), max(people("age")))
  • Java:
Dataset<Row> people ="...");
Dataset<Row> department ="...");

 .join(department, people.col("deptId").equalTo(department.col("id")))
 .groupBy(department.col("name"), people.col("gender"))
 .agg(avg(people.col("salary")), max(people.col("age")));

Discover More
Card Puncher Data Processing
Spark DataSet - Column

A dataset column Scala Java The following creates a new column that increases everybody's age by 10. in Scala Java See /org/apache/spark/sql/ColumnColumn
Card Puncher Data Processing
Spark DataSet - Data Frame

The data frame is a dataset of rows (ie organized into named columns). Technically, a data frame is an untyped view of a dataset. A SparkDataFrame is a distributed collection of data organized into...

Share this page:
Follow us:
Task Runner