EMR Serverless submitting a job using command-runner.jar?

0

Hi! I'm excited to see EMR Serverless. I went through the quick start and was able to run the word count, pretty cool.

Now I would like to submit a spark job using command-runner.jar like I do on a regular EMR Cluster but when submitting a job EMR Studio asks for the script S3 URI. It is not a script it is a JAR and in a regular EMR cluster I use command-runner.jar to call a spark-submit with another jar.

I looked for documentation but this all seems pretty new and couldn't find any. How can I submit a job using command-runner.jar? Or do I submit the job directly?

Any tips would be highly appreciated!

asked a year ago1093 views
1 Answer
0

In EMR Serverless, you can submit a Spark job using the aws emr-containers submit-job command. This command submits a job to run in a container on an EMR-managed infrastructure. Here is an example command:

aws emr-containers submit-job \
--cluster arn:aws:eks:us-west-2:012345678910:cluster/my-emr-cluster \
--job-name my-job \
--execution-role-arn arn:aws:iam::012345678910:role/emr-containers \
--region us-west-2 \
--command '["/usr/bin/spark-submit", "--deploy-mode", "client", "--class", "org.apache.spark.examples.SparkPi", "local:///usr/lib/spark/examples/jars/spark-examples.jar", "10"]'

In the --command option, you can specify the command you want to run inside the container. In this example, we're using spark-submit to run the SparkPi example with 10 partitions.

You can also pass any other options or arguments that you would normally pass to spark-submit. For example, you can specify the path to your JAR file as an argument:

aws emr-containers submit-job \
--cluster arn:aws:eks:us-west-2:012345678910:cluster/my-emr-cluster \
--job-name my-job \
--execution-role-arn arn:aws:iam::012345678910:role/emr-containers \
--region us-west-2 \
--command '["/usr/bin/spark-submit", "--deploy-mode", "client", "--class", "com.example.MyJob", "s3://my-bucket/my-job.jar", "arg1", "arg2"]'

Note that the JAR file needs to be located in an S3 bucket that is accessible from your EMR Serverless cluster.

hash
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions