Adding JAR Files to a Spark Job with Spark-Submit
ClassPath Effects
Using extraClassPath or --driver-class-path sets the classpath for the driver node, while spark.executor.extraClassPath sets it for worker nodes. To have a JAR affect both, specify it in both configurations.
Separation Character
The separator used depends on the operating system:
File Distribution
In client mode, files are distributed via an HTTP server. In cluster mode, they must be made available to workers through HDFS or other shared storage.
URI Types
Accepted URL schemes include:
Affected Options
Priority
Values set directly on the SparkConf take precedence over flags or Spark-submit options.
For Simplicity
In client mode, one can use the following to add JARs for both driver and workers:
spark-submit --jars additional1.jar,additional2.jar \ --driver-class-path additional1.jar:additional2.jar \ --conf spark.executor.extraClassPath=additional1.jar:additional2.jar \ --class MyClass main-application.jar
In cluster mode, however, ensure JARs are accessible through a shared storage system.
The above is the detailed content of How do I Add JAR Files to a Spark Job with Spark-Submit and How Does the Classpath Work?. For more information, please follow other related articles on the PHP Chinese website!