Spark - Connection (Context)

1 - About

A Spark Connection is :

This object is called:

  • an SQL Context for a RDD (in Spark 1.x.)
  • SparkSession for a dataset (dataframe). (It was also known as the SqlContext)

An instance of a context object is a spark app

Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContext object in your main program (called the driver program).

A Spark Connection object:

  • connect to a cluster managers (which allocate resources across applications)
  • acquires executors on nodes in the cluster
  • sends your application code (defined by JAR or Python files passed to SparkContext) to the executors.
  • sends tasks to the executors to run.

Every driver (Spark program) has a single Spark Connection object.

3 - Management

3.1 - Initialization


  • using the interactive Spark shell, this object is automatically created for you and given the name sc.
  • writing an application, you need to create one yourself.

3.2 - master connection url

The master connection URL is by default in the configuration file.

Data Science
Data Analysis
Data Science
Linear Algebra Mathematics

Powered by ComboStrap