Spark - Jar

Launch external Jar

Jar can be defined in a spark-submit command via

Jar file with the:
- --jars option. It define the path to jars file that will be automatically transferred to the cluster.
Maven coordinates:
- --package option - a comma-delimited list of Maven coordinates
- --repositories options - to define the maven repo

spark-submit --jars additional1.jar,additional2.jar \
  --driver-class-path additional1.jar:additional2.jar \
  --conf spark.executor.extraClassPath=additional1.jar:additional2.jar \
  --packages mypackage
  --class MyClass main-application.jar

More, see advanced-dependency-management

Conf

jars

Search jars in Spark config

spark.jars is the comma-separated list of jars to include on the driver and executor classpaths. Globs are allowed.

Library Path

spark.driver.extraLibraryPath
spark.executor.extraLibraryPath

Value example: /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64

Location

Local: SPARK_HOME\jars
Local: PYSPARK_HOME\jars
Azure:
- /usr/hdp/current/spark2-client/jars/
- /usr/hdp/current/hadoop-client/
- /usr/hdp/current/hadoop-hdfs-client/lib/
- /usr/hdp/current/hadoop-yarn-client/
- /usr/hdp/current/spark_llap/ Hive - Live Long And Process (LLAP)
- /usr/lib/hdinsight-datalake/ Azure Data Lake - JAR