Spark - Spark-submit

Card Puncher Data Processing


The spark submit application to submit application.

The spark-submit script is used to launch applications on a cluster.

Spark jobs are generally submitted from an edge node


./bin/spark-submit \
  --class <main-class> \
  --master <master-url> \
  --deploy-mode <deploy-mode> \
  --jars <additional1.jar,additional2.jar> \
  --conf <key>=<value> \
  ... # other options
  <application-jar> \



Run application locally on 8 cores

./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master local[8] \
  /path/to/examples.jar \

More ?. See Java submit

Documentation / Reference

Discover More
Card Puncher Data Processing
PySpark - Submit an PySpark App

Spark Program
Spark - Application

An application is an instance of a driver created via the initialization of a spark context (RDD) or a spark session (Data Set) This instance can be created via: a whole script (called batch mode)...
Card Puncher Data Processing
Spark - Client

via livy On a local node: Download Spark with the same spark and hadoop versions than on the cluster. Extract them to a location () Install the same...
Card Puncher Data Processing
Spark - Configuration

The configuration of Spark is mostly: configuration around an app. runtime-environment The application web UI...
Card Puncher Data Processing
Spark - Jar

Jar can be defined in a spark-submit command via Jar file with the: --jars option. It define the path to jars file that will be automatically transferred to the cluster. Maven coordinates: --package...
Spark Debug Remote
Spark - Java API

API In addition, if you wish to access an HDFS cluster, you need to add a dependency on hadoop-client for your version of HDFS. Local Master Example: Run application locally on 8 cores ...
Card Puncher Data Processing
Spark - Livy (Rest API )

Livy is an open source REST interface for interacting with Spark from anywhere. It supports executing: snippets of code or programs in a Spark Context that runs locally or in YARN. It's used...
Card Puncher Data Processing
Spark - Submit Application

In Spark, you can submit application with the: spark-submit client tool or the rest api
Card Puncher Data Processing
Spark - Yarn

Yarn is a cluster manager supported by Spark. The deployment mode sets where the driver will run. The driver will run: In client mode, in the client process (ie in the current machine), and the...

Share this page:
Follow us:
Task Runner