Resolving classpath issues on EMR

0

What are the general guidelines to in Resolving classpath issues on EMR? One of the issues when running pipelines on EMR is related to classpath issues related to custom jars:

Data Processing pipelines frequently fail on EMR due to not being able to refer to the specific versions of dependent jars even though customer uploaded the required jars to S3 and then pushed to the EMR master node at the time of cluster creation. tried set the below parameters as part of the Pipeline command:

-D mapreduce.task.classpath.user.precedence -D mapreduce.job.user.classpath.first

AWS
preguntada hace 4 años848 visualizaciones
1 Respuesta
0
Respuesta aceptada

This is wide topic, and usually depends on the framework that you're using. Generally speaking for application where you should submit a JAR, like Spark or MR, the recommended approach is to generate a fat JAR with all the dependencies inside. This guarantees that the JVM will always pick the correct libraries from the JAR instead of looking them on the cluster, where it might be not able to find them or pick a wrong version.

If you're interested, in this third party article [ http://tutorials.jenkov.com/maven/maven-build-fat-jar.html ] you can find more details about fat JARs and how to create them.

AWS
respondido hace 4 años

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas