EMR Serverless submitting a job using command-runner.jar?

0

Hi! I'm excited to see EMR Serverless. I went through the quick start and was able to run the word count, pretty cool.

Now I would like to submit a spark job using command-runner.jar like I do on a regular EMR Cluster but when submitting a job EMR Studio asks for the script S3 URI. It is not a script it is a JAR and in a regular EMR cluster I use command-runner.jar to call a spark-submit with another jar.

I looked for documentation but this all seems pretty new and couldn't find any. How can I submit a job using command-runner.jar? Or do I submit the job directly?

Any tips would be highly appreciated!

preguntada hace un año1128 visualizaciones
1 Respuesta
0

In EMR Serverless, you can submit a Spark job using the aws emr-containers submit-job command. This command submits a job to run in a container on an EMR-managed infrastructure. Here is an example command:

aws emr-containers submit-job \
--cluster arn:aws:eks:us-west-2:012345678910:cluster/my-emr-cluster \
--job-name my-job \
--execution-role-arn arn:aws:iam::012345678910:role/emr-containers \
--region us-west-2 \
--command '["/usr/bin/spark-submit", "--deploy-mode", "client", "--class", "org.apache.spark.examples.SparkPi", "local:///usr/lib/spark/examples/jars/spark-examples.jar", "10"]'

In the --command option, you can specify the command you want to run inside the container. In this example, we're using spark-submit to run the SparkPi example with 10 partitions.

You can also pass any other options or arguments that you would normally pass to spark-submit. For example, you can specify the path to your JAR file as an argument:

aws emr-containers submit-job \
--cluster arn:aws:eks:us-west-2:012345678910:cluster/my-emr-cluster \
--job-name my-job \
--execution-role-arn arn:aws:iam::012345678910:role/emr-containers \
--region us-west-2 \
--command '["/usr/bin/spark-submit", "--deploy-mode", "client", "--class", "com.example.MyJob", "s3://my-bucket/my-job.jar", "arg1", "arg2"]'

Note that the JAR file needs to be located in an S3 bucket that is accessible from your EMR Serverless cluster.

hash
respondido hace un año

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas