Spark - Application

About

An application is an instance of a driver created via the initialization of a spark context (RDD) or a spark session (Data Set)

This instance can be created via:

Within each Spark application, multiple jobs may be running concurrently if they were submitted by different threads.

_

A typical script starts with a session (context) and defines:

that are table/like object where you can perform data operation.

See self-contained-applications

Example

Management

Properties

Properties are seen as configuration.

See Application properties such as application name, …

  • In the code
val conf = new SparkConf()
             .setMaster("local[2]")
             .setAppName("CountingSheep")
val sc = new SparkContext(conf)
// empty conf in the code
val sc = new SparkContext(new SparkConf())
./bin/spark-submit --name "My app" --master local[4] --conf spark.eventLog.enabled=false --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" myApp.jar

Name

The name is used for instance in the name of the log file

Submit

spark-submit

The spark-submit script is used to launch applications:

./bin/spark-submit \
  --class <main-class> \
  --master <master-url> \
  --deploy-mode <deploy-mode> \
  --conf <key>=<value> \
  ... # other options
  <application-jar> \
  [application-arguments]

Python and R -(from quickstart):

# For Python examples, use spark-submit directly:
./bin/spark-submit examples/src/main/python/pi.py

# For R examples, use spark-submit directly:
./bin/spark-submit examples/src/main/r/dataframe.R

livy

Kill

  • Spark standalone or Mesos with cluster deploy mode
spark-submit --kill [submission ID] --master [spark://...]
  • Java
./bin/spark-class org.apache.spark.deploy.Client kill <master url> <driver ID>

Status

  • Spark standalone or Mesos with cluster deploy mode
spark-submit --status [submission ID] --master [spark://...]

Runtime execution configuration


Powered by ComboStrap