Resolving classpath issues on EMR

0

What are the general guidelines to in Resolving classpath issues on EMR? One of the issues when running pipelines on EMR is related to classpath issues related to custom jars:

Data Processing pipelines frequently fail on EMR due to not being able to refer to the specific versions of dependent jars even though customer uploaded the required jars to S3 and then pushed to the EMR master node at the time of cluster creation. tried set the below parameters as part of the Pipeline command:

-D mapreduce.task.classpath.user.precedence -D mapreduce.job.user.classpath.first

AWS
asked 4 years ago808 views
1 Answer
0
Accepted Answer

This is wide topic, and usually depends on the framework that you're using. Generally speaking for application where you should submit a JAR, like Spark or MR, the recommended approach is to generate a fat JAR with all the dependencies inside. This guarantees that the JVM will always pick the correct libraries from the JAR instead of looking them on the cluster, where it might be not able to find them or pick a wrong version.

If you're interested, in this third party article [ http://tutorials.jenkov.com/maven/maven-build-fat-jar.html ] you can find more details about fat JARs and how to create them.

AWS
answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions