The driver is a (daemon|service) wrapper created when you get a spark context (connection) that look after the lifecycle of the Spark job.
Spark supports many cluster manager and in the context of Yarn, the driver is a synonym for the application manager.
The driver:
You can then create your spark application interactively.
Every driver has a single Sparkcontext object.
See Spark - Cluster
The driver in Yarn is the application manager. See application manager memory
spark-shell --conf spark.driver.extraJavaOptions="-XX:MaxPermSize=384m"
Number of thread (ie core)
Spark - Configuration. spark.driver.cores
export SPARK_PID_DIR=/var/run/spark2
Conf | Default | Desc |
---|---|---|
spark.driver.host | (local hostname) | Hostname or IP address for the driver. This is used for communicating with the executors and the standalone Master. |
spark.driver.port | (random) | Port for the driver to listen on. This is used for communicating with the executors and the standalone Master. |
Default to 4040
The driver machine is the single machine where the driver will run (and therefore initiates the Spark job and where summary results will be collected)
It can be:
For instance on the Yarn cluster manager, see Yarn deployment mode