Launch external Jar
Jar can be defined in a spark-submit command via
- Jar file with the:
- --jars option. It define the path to jars file that will be automatically transferred to the cluster.
- Maven coordinates:
- --package option - a comma-delimited list of Maven coordinates
- --repositories options - to define the maven repo
spark-submit --jars additional1.jar,additional2.jar \
--driver-class-path additional1.jar:additional2.jar \
--conf spark.executor.extraClassPath=additional1.jar:additional2.jar \
--packages mypackage
--class MyClass main-application.jar
More, see advanced-dependency-management
Conf
jars
Search jars in Spark config
- spark.jars is the comma-separated list of jars to include on the driver and executor classpaths. Globs are allowed.
Library Path
- spark.driver.extraLibraryPath
- spark.executor.extraLibraryPath
Value example: /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
Location
- Local: SPARK_HOME\jars
- Local: PYSPARK_HOME\jars
- Azure:
- /usr/hdp/current/spark2-client/jars/
- /usr/hdp/current/hadoop-client/
- /usr/hdp/current/hadoop-hdfs-client/lib/
- /usr/hdp/current/hadoop-yarn-client/
- /usr/hdp/current/spark_llap/ Hive - Live Long And Process (LLAP)
- /usr/lib/hdinsight-datalake/ Azure Data Lake - JAR