- Newest
- Most votes
- Most comments
The error you're encountering, "ClassNotFoundException: com.google.protobuf.GeneratedMessageV3", is likely due to a missing or incompatible protobuf dependency after upgrading your Spark and Scala versions.
To resolve this issue, you need to ensure that the correct version of the protobuf library is available in your Spark application's classpath. Here are some steps you can take:
-
Check the protobuf version required by Spark 3.4.1. You may need to add the appropriate protobuf dependency to your project.
-
If you're using a build tool like Maven or SBT, add the following dependency (adjust the version if needed):
com.google.protobuf:protobuf-java:3.x.x
- If you're submitting the job using spark-submit, you can include the protobuf jar in the --jars option:
--jars /path/to/protobuf-java-3.x.x.jar
-
Ensure that your UDF and any dependencies it relies on are compatible with Spark 3.4.1 and Scala 2.12.15.
-
If you're using EMR 6.14.0, check if the protobuf library is already included in the EMR classpath. If it is, make sure your application is not introducing a conflicting version.
-
You may need to update your SparkSubmitOperator configuration to include the necessary jars or adjust the classpath.
If the issue persists, you might want to review your entire dependency tree to ensure all libraries are compatible with your new Spark and Scala versions. Sometimes, upgrading major versions can lead to compatibility issues with other dependencies.
Remember to test your changes in a development environment before deploying to production. If you continue to face issues, you may need to consult the specific EMR and Spark documentation for version 6.14.0 and 3.4.1 respectively, as there might be EMR-specific configurations or known issues related to this upgrade.
Sources
Amazon EMR release 6.14.0 - Amazon EMR
Troubleshooting Amazon EMR on EKS Spark operator - Amazon EMR
From the shared error message; ... Caused by: java.lang.ClassNotFoundException: com.google.protobuf.GeneratedMessageV3 ...
Understanding the Issue
Based on the error message and the Dinky official documentation [1], the root cause seems to be a missing or incompatible protobuf dependency.
Dinky requires the com.google.protobuf.GeneratedMessageV3 class, which is part of the Protobuf library. However, your EMR cluster is using Protobuf version 2.5.0, which does not include this class. Proposed Solutions
Upgrade EMR Cluster: If Dinky indeed requires Protobuf 3.x, you can upgrade your EMR cluster to version 6.9.0 or later. These versions use Hadoop 3.3.3, which includes Protobuf 3.7.1 [2].
Replace JAR: Manually replace the protobuf-java-2.5.0.jar with a newer version that includes GeneratedMessageV3. You'll need to do this on all nodes in your cluster. Note: This is not an officially supported solution and may require testing.
Example: Replace /usr/lib/hadoop/lib/protobuf-java-2.5.0.jar with a newer version like protobuf-java-3.24.4.jar.
Important Considerations
Dinky is a Third-Party Library: AWS Support cannot provide direct support for third-party software like Dinky. We can only offer best effort assistance.
Testing: Always test any changes in a non-production environment before deploying them to production.
Dinky Community: If you continue to face issues, consider reaching out to the Dinky development team for more specific guidance.
Additional Notes
Bootstrap Actions: You can use bootstrap actions to automate the process of replacing JAR files on all nodes in your cluster.
Compatibility: Ensure that the new Protobuf version is compatible with other components in your environment.
I hope this information helps! If you have any further questions, feel free to contact AWS Support.
[1] Dinky official documentation - https://github.com/DataLinkDC/dinky/blob/dev/pom.xml [2] Hadoop 3.3.3 - https://hadoop.apache.org/docs/r3.3.3/ [3] Bootstrap action basics - https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-bootstrap.html#bootstrapUses [4] What types of questions are supported? - https://aws.amazon.com/premiumsupport/faqs/?nc1=h_ls
Relevant content
- asked 6 years ago
- asked 2 years ago
- asked 2 years ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 21 days ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
regarding protobuf version I am using shading in my build.sbt. So it is picking version according to requirement. I think the issue is coming due to including hadoop library in emr_on_eks config which has its own protobuf version and is overwriting the protobuf version in my jar. Is there any way to exclude protobuf library from SparkSubmitOperator?