About
The driver is a (daemon|service) wrapper created when you get a spark context (connection) that look after the lifecycle of the Spark job.
Spark supports many cluster manager and in the context of Yarn, the driver is a synonym for the application manager.
The driver:
- start as its own service (daemon)
- connect to a cluster manager,
- get the worker (executor
- manage them. It listen for and accept incoming connections from its worker (executors) throughout its lifetime. As such, the driver program must be network addressable from the worker nodes.
You can then create your spark application interactively.
Every driver has a single Sparkcontext object.
See Spark - Cluster
Articles Related
Management
Memory
- spark.driver.memory
The driver in Yarn is the application manager. See application manager memory
- perrmgen is controlled by spark.driver.extraJavaOptions
spark-shell --conf spark.driver.extraJavaOptions="-XX:MaxPermSize=384m"
Core
Number of thread (ie core)
Spark - Configuration. spark.driver.cores
PID
export SPARK_PID_DIR=/var/run/spark2
Service port
Conf | Default | Desc |
---|---|---|
spark.driver.host | (local hostname) | Hostname or IP address for the driver. This is used for communicating with the executors and the standalone Master. |
spark.driver.port | (random) | Port for the driver to listen on. This is used for communicating with the executors and the standalone Master. |
UI
Default to 4040
Machine
The driver machine is the single machine where the driver will run (and therefore initiates the Spark job and where summary results will be collected)
It can be:
- client: the local machine
- cluster: the resource manager instantiate a machine
For instance on the Yarn cluster manager, see Yarn deployment mode