A Spark Connection is :
- the first step when creating a script
This object is called:
- an SQL Context for a RDD (in Spark 1.x.)
- SparkSession for a dataset (dataframe). (It was also known as the SqlContext)
An instance of a context object is a spark app
A Spark Connection object:
- connect to a cluster managers (which allocate resources across applications)
- acquires executors on nodes in the cluster
- sends your application code (defined by JAR or Python files passed to SparkContext) to the executors.
- sends tasks to the executors to run.
Every driver (Spark program) has a single Spark Connection object.
- using the interactive Spark shell, this object is automatically created for you and given the name sc.
- writing an application, you need to create one yourself.