How do I resolve the "java.lang.ClassNotFoundException" error in Spark on Amazon EMR?
I want to resolve the "java.lang.ClassNotFoundException" error in Apache Spark on Amazon EMR.
Short description
The java.lang.ClassNotFoundException error in Spark occurs for the following reasons:
- The spark-submit job can't find the relevant files in the class path.
- A bootstrap action or custom configuration overrides the class paths. As a result, the class loader picks up only the JAR files that exist in the location that you specified in your configuration.
Resolution
To resolve the java.lang.ClassNotFoundException error, check the stack trace to find the name of the missing class. Then, add the path of your custom JAR that contains the missing class to the Spark class path. You can add the path of your custom JAR on a running cluster, a new cluster, or when you submit a job.
Add your custom JAR path on a running cluster
In /etc/spark/conf/spark-defaults.conf, add the path of your custom JAR to the class names that are specified in the error stack trace.
Example:
sudo vim /etc/spark/conf/spark-defaults.conf spark.driver.extraClassPath <other existing jar locations>:example-custom-jar-path spark.executor.extraClassPath <other existing jar locations>:example-custom-jar-path
Note: Replace example-custom-jar-path with your custom JAR path.
Add your custom JAR path on a new cluster
To add your custom JAR path to existing class paths in /etc/spark/conf/spark-defaults.conf, supply a configuration object when you create a new cluster. Use Amazon EMR versions 5.14.0 or later to create a new cluster.
For Amazon EMR 5.14.0 to Amazon EMR 5.17.0, include the following:
[ { "Classification": "spark-defaults", "Properties": { "spark.driver.extraClassPath": "/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*", "spark.executor.extraClassPath": "/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*" } } ]
For Amazon EMR 5.17.0 to Amazon EMR 5.18.0, include /usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar as the additional JAR path:
[ { "Classification": "spark-defaults", "Properties": { "spark.driver.extraClassPath": "/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/home/hadoop/extrajars/*", "spark.executor.extraClassPath": "/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/home/hadoop/extrajars/*" } } ]
For Amazon EMR 5.19.0 to Amazon EMR 5.32.0, update the JAR path to the following:
[ { "Classification": "spark-defaults", "Properties": { "spark.driver.extraClassPath": "/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/home/hadoop/extrajars/*", "spark.executor.extraClassPath": "/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/home/hadoop/extrajars/*" } } ]
For Amazon EMR 5.33.0 to Amazon EMR 5.36.0, update the JAR path to the following:
[ { "Classification": "spark-defaults", "Properties": { "spark.driver.extraClassPath": "/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/home/hadoop/extrajars/", "spark.executor.extraClassPath": "/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/home/hadoop/extrajars/" } } ]
For Amazon EMR versions 6.0.0 and later, you can't use the configuration to update the JAR path because the conf file includes several JAR paths. Also, the length of each property configuration that you update can't exceed 1024 characters. To pass the custom JAR location to spark-defaults.conf, add a bootstrap action. For more information, see How do I update all Amazon EMR notes after the bootstrap phase?
To add a bootstrap action, see Add custom bootstrap actions, and then take the following actions:
- Replace s3://example-bucket/Bootstraps/script_b.sh with your Amazon Simple Storage Service (Amazon S3) path.
- Replace /home/hadoop/extrajars/* with your custom JAR file path.
- Confirm that the Amazon EMR runtime role has the required permissions to access the Amazon S3 bucket.
Note: When you add the bootstrap script, the script applies to the cluster's Spark configuration instead of a specific job.
Example script to change /etc/spark/conf/spark-defaults.conf:
#!/bin/bash # # This is an example of script_b.sh for changing /etc/spark/conf/spark-defaults.conf # while [ ! -f /etc/spark/conf/spark-defaults.conf ] do sleep 1 done # # Now the file is available, do your work here # sudo sed -i '/spark.*.extraClassPath/s/$/:\/home\/hadoop\/extrajars\/\*/' /etc/spark/conf/spark-defaults.conf exit 0 Launch the EMR cluster and add a bootstrap action similar to the following: #!/bin/bash pwd aws s3 cp s3://example-bucket/Bootstraps/script_b.sh . chmod +x script_b.sh nohup ./script_b.sh &
Note: Replace example-bucket with your Amazon S3 bucket.
Add your custom JAR path when you submit a job
To add your custom JAR path when you submit a job, run the spark-submit command with the jars option. For more information, see Launching applications with spark-submit on the Apache Spark website.
spark-submit --deploy-mode client --class org.apache.spark.examples.SparkPi --master yarn spark-examples.jar 100 --jars example-custom-jar-path
Note: Replace example-custom-jar-path with your custom JAR path. To prevent class conflicts, don't include standard JARs when you use the jars option. For example, don't include spark-core.jar because it already exists in the cluster. For more information, see Configure Spark.
Related information
Spark configuration on the Apache Spark website
- Topics
- Analytics
- Tags
- Amazon EMR
- Language
- English

Relevant content
- asked 5 months ago