Resolving classpath issues on EMR

0

What are the general guidelines to in Resolving classpath issues on EMR? One of the issues when running pipelines on EMR is related to classpath issues related to custom jars:

Data Processing pipelines frequently fail on EMR due to not being able to refer to the specific versions of dependent jars even though customer uploaded the required jars to S3 and then pushed to the EMR master node at the time of cluster creation. tried set the below parameters as part of the Pipeline command:

-D mapreduce.task.classpath.user.precedence -D mapreduce.job.user.classpath.first

AWS
feita há 4 anos848 visualizações
1 Resposta
0
Resposta aceita

This is wide topic, and usually depends on the framework that you're using. Generally speaking for application where you should submit a JAR, like Spark or MR, the recommended approach is to generate a fat JAR with all the dependencies inside. This guarantees that the JVM will always pick the correct libraries from the JAR instead of looking them on the cluster, where it might be not able to find them or pick a wrong version.

If you're interested, in this third party article [ http://tutorials.jenkov.com/maven/maven-build-fat-jar.html ] you can find more details about fat JARs and how to create them.

AWS
respondido há 4 anos

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas