Spark - Jar

1 - Launch external Jar

Jar can be defined in a spark-submit command via

  • Jar file with the:
    • --jars option. It define the path to jars file that will be automatically transferred to the cluster.
  • Maven coordinates:
    • --package option - a comma-delimited list of Maven coordinates
    • --repositories options - to define the maven repo

spark-submit --jars additional1.jar,additional2.jar \
  --driver-class-path additional1.jar:additional2.jar \
  --conf spark.executor.extraClassPath=additional1.jar:additional2.jar \
  --packages mypackage
  --class MyClass main-application.jar

More, see advanced-dependency-management

2 - Conf

2.1 - jars

Search jars in Spark config

  • spark.jars is the comma-separated list of jars to include on the driver and executor classpaths. Globs are allowed.

2.2 - Library Path

  • spark.driver.extraLibraryPath
  • spark.executor.extraLibraryPath

Value example: /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64

3 - Location

  • Local: SPARK_HOME\jars
  • Local: PYSPARK_HOME\jars
  • Azure:
    • /usr/hdp/current/spark2-client/jars/
    • /usr/hdp/current/hadoop-client/
    • /usr/hdp/current/hadoop-hdfs-client/lib/
    • /usr/hdp/current/hadoop-yarn-client/
    • /usr/hdp/current/spark_llap/ Hive - Live Long And Process (LLAP)
    • /usr/lib/hdinsight-datalake/ Azure Data Lake - JAR

Data Science
Data Analysis
Statistics
Data Science
Linear Algebra Mathematics
Trigonometry

Powered by ComboStrap