Table of Contents

About

A dataset is a data set of specific object.

Management

Definition

To define this domain Specific Object, an encoder is required.

  • Scala
val people = spark.read.parquet("...").as[Person]
Dataset<Person> people = spark.read().parquet("...").as(Encoders.bean(Person.class)); 

See

To understand the internal binary representation for data, use the schema function.

Documentation / Reference