- start as its own service (daemon)
- connect to a cluster manager,
- get the worker (executor
- manage them. It listen for and accept incoming connections from its worker (executors) throughout its lifetime. As such, the driver program must be network addressable from the worker nodes.
You can then create your spark application interactively.
Every driver has a single Sparkcontext object.
See Spark - Cluster
- perrmgen is controlled by spark.driver.extraJavaOptions
spark-shell --conf spark.driver.extraJavaOptions="-XX:MaxPermSize=384m"
Number of thread (ie core)
Spark - Configuration. spark.driver.cores
|spark.driver.host||(local hostname)||Hostname or IP address for the driver. This is used for communicating with the executors and the standalone Master.|
|spark.driver.port||(random)||Port for the driver to listen on. This is used for communicating with the executors and the standalone Master.|
Default to 4040
The driver machine is the single machine where the driver will run (and therefore initiates the Spark job and where summary results will be collected)
It can be:
- client: the local machine
- cluster: the resource manager instantiate a machine
For instance on the Yarn cluster manager, see Yarn deployment mode