Spark - Driver

About

The driver is a (daemon|service) wrapper created when you get a spark context (connection) that look after the lifecycle of the Spark job.

Spark supports many cluster manager and in the context of Yarn, the driver is a synonym for the application manager.

The driver:

start as its own service (daemon)
connect to a cluster manager,
get the worker (executor
manage them. It listen for and accept incoming connections from its worker (executors) throughout its lifetime. As such, the driver program must be network addressable from the worker nodes.

You can then create your spark application interactively.

Every driver has a single Sparkcontext object.

See Spark - Cluster

Articles Related

Management

Memory

Spark - Configuration

spark.driver.memory

The driver in Yarn is the application manager. See application manager memory

perrmgen is controlled by spark.driver.extraJavaOptions

spark-shell --conf spark.driver.extraJavaOptions="-XX:MaxPermSize=384m"

Core

Number of thread (ie core)

Spark - Configuration. spark.driver.cores

PID

export SPARK_PID_DIR=/var/run/spark2

Service port

Spark - Configuration

Conf	Default	Desc
spark.driver.host	(local hostname)	Hostname or IP address for the driver. This is used for communicating with the executors and the standalone Master.
spark.driver.port	(random)	Port for the driver to listen on. This is used for communicating with the executors and the standalone Master.

UI

Default to 4040

See Spark - Web UI (Driver UI)

Machine

The driver machine is the single machine where the driver will run (and therefore initiates the Spark job and where summary results will be collected)

It can be:

client: the local machine
cluster: the resource manager instantiate a machine

For instance on the Yarn cluster manager, see Yarn deployment mode