Table of Contents

About

The driver is a (daemon|service) wrapper created when you get a spark context (connection) that look after the lifecycle of the Spark job.

Spark supports many cluster manager and in the context of Yarn, the driver is a synonym for the application manager.

The driver:

  • start as its own service (daemon)
  • connect to a cluster manager,
  • manage them. It listen for and accept incoming connections from its worker (executors) throughout its lifetime. As such, the driver program must be network addressable from the worker nodes.

Spark Cluster

You can then create your spark application interactively.

Every driver has a single Sparkcontext object.

See Spark - Cluster

Management

Memory

Spark - Configuration

  • spark.driver.memory

The driver in Yarn is the application manager. See application manager memory

  • perrmgen is controlled by spark.driver.extraJavaOptions
spark-shell --conf spark.driver.extraJavaOptions="-XX:MaxPermSize=384m"

Core

Number of thread (ie core)

Spark - Configuration. spark.driver.cores

PID

export SPARK_PID_DIR=/var/run/spark2

Service port

Spark - Configuration

Conf Default Desc
spark.driver.host (local hostname) Hostname or IP address for the driver. This is used for communicating with the executors and the standalone Master.
spark.driver.port (random) Port for the driver to listen on. This is used for communicating with the executors and the standalone Master.

UI

Default to 4040

See Spark - Web UI (Driver UI)

Machine

The driver machine is the single machine where the driver will run (and therefore initiates the Spark job and where summary results will be collected)

It can be:

  • client: the local machine
  • cluster: the resource manager instantiate a machine

For instance on the Yarn cluster manager, see Yarn deployment mode