When using Spark-Submit, there are several options for adding JAR files to a Spark job, each with its own implications for classpath, file distribution, and priority.
Spark-Submit influences ClassPaths through these options:
For a file to be included on both ClassPaths, it needs to be specified in both flags.
File distribution depends on the execution mode:
Spark-Submit supports the following URI prefixes for file distribution:
The options mentioned in the question affect JAR file handling as follows:
Properties set directly on SparkConf have the highest precedence, followed by Spark-Submit flags and then options in spark-defaults.conf. Therefore, any values set in code will override corresponding flags or options.
In client mode, it's safe to add JAR files using all three main options:
spark-submit --jars additional1.jar,additional2.jar \ --driver-class-path additional1.jar:additional2.jar \ --conf spark.executor.extraClassPath=additional1.jar:additional2.jar \ --class MyClass main-application.jar
However, in cluster mode, you should only add files using --jars, and manually distribute them to the worker nodes yourself. Redundant arguments like passing JAR files to --driver-library-path should be avoided.
The above is the detailed content of How are JAR files added to a Spark job using Spark-Submit, and what are the different options and considerations for doing so?. For more information, please follow other related articles on the PHP Chinese website!