Spark - Jar

Card Puncher Data Processing

Launch external Jar

Jar can be defined in a spark-submit command via

  • Jar file with the:
    • --jars option. It define the path to jars file that will be automatically transferred to the cluster.
  • Maven coordinates:
    • --package option - a comma-delimited list of Maven coordinates
    • --repositories options - to define the maven repo
spark-submit --jars additional1.jar,additional2.jar \
  --driver-class-path additional1.jar:additional2.jar \
  --conf spark.executor.extraClassPath=additional1.jar:additional2.jar \
  --packages mypackage
  --class MyClass main-application.jar

More, see advanced-dependency-management

Conf

jars

Search jars in Spark config

  • spark.jars is the comma-separated list of jars to include on the driver and executor classpaths. Globs are allowed.

Library Path

  • spark.driver.extraLibraryPath
  • spark.executor.extraLibraryPath

Value example: /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64

Location

  • Local: SPARK_HOME\jars
  • Local: PYSPARK_HOME\jars
  • Azure:





Discover More
Card Puncher Data Processing
Spark - Spark-submit

The spark submit application to submit application. The spark-submit script is used to launch applications on a cluster. Spark jobs are generally submitted from an edge node where: class is...
Card Puncher Data Processing
Spark DataSet - JDBC

Jdbc is a data source



Share this page:
Follow us:
Task Runner