Spark - Application

1 - About

An application is an instance of a driver created via the initialization of a spark context (RDD) or a spark session (Data Set)

This instance can be created via:

Within each Spark application, multiple jobs may be running concurrently if they were submitted by different threads.

A typical script starts with a session (context) and defines:

that are table/like object where you can perform data operation.

See self-contained-applications

3 - Example

4 - Management

4.1 - Properties

Properties are seen as configuration.

See Application properties such as application name, …

  • In the code

val conf = new SparkConf()
val sc = new SparkContext(conf)

// empty conf in the code
val sc = new SparkContext(new SparkConf())

./bin/spark-submit --name "My app" --master local[4] --conf spark.eventLog.enabled=false --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" myApp.jar

4.1.1 - Name

The name is used for instance in the name of the log file

4.2 - Submit

4.2.1 - spark-submit

The spark-submit script is used to launch applications:

./bin/spark-submit \
  --class <main-class> \
  --master <master-url> \
  --deploy-mode <deploy-mode> \
  --conf <key>=<value> \
  ... # other options
  <application-jar> \

Python and R -(from quickstart):

# For Python examples, use spark-submit directly:
./bin/spark-submit examples/src/main/python/

# For R examples, use spark-submit directly:
./bin/spark-submit examples/src/main/r/dataframe.R

4.2.2 - livy

4.3 - Kill

  • Spark standalone or Mesos with cluster deploy mode

spark-submit --kill [submission ID] --master [spark://...]

  • Java

./bin/spark-class org.apache.spark.deploy.Client kill <master url> <driver ID>

4.4 - Status

  • Spark standalone or Mesos with cluster deploy mode

spark-submit --status [submission ID] --master [spark://...]

4.5 - Runtime execution configuration

Data Science
Data Analysis
Data Science
Linear Algebra Mathematics

Powered by ComboStrap