EMR Serverless submitting a job using command-runner.jar?

0

Hi! I'm excited to see EMR Serverless. I went through the quick start and was able to run the word count, pretty cool.

Now I would like to submit a spark job using command-runner.jar like I do on a regular EMR Cluster but when submitting a job EMR Studio asks for the script S3 URI. It is not a script it is a JAR and in a regular EMR cluster I use command-runner.jar to call a spark-submit with another jar.

I looked for documentation but this all seems pretty new and couldn't find any. How can I submit a job using command-runner.jar? Or do I submit the job directly?

Any tips would be highly appreciated!

gefragt vor einem Jahr1128 Aufrufe
1 Antwort
0

In EMR Serverless, you can submit a Spark job using the aws emr-containers submit-job command. This command submits a job to run in a container on an EMR-managed infrastructure. Here is an example command:

aws emr-containers submit-job \
--cluster arn:aws:eks:us-west-2:012345678910:cluster/my-emr-cluster \
--job-name my-job \
--execution-role-arn arn:aws:iam::012345678910:role/emr-containers \
--region us-west-2 \
--command '["/usr/bin/spark-submit", "--deploy-mode", "client", "--class", "org.apache.spark.examples.SparkPi", "local:///usr/lib/spark/examples/jars/spark-examples.jar", "10"]'

In the --command option, you can specify the command you want to run inside the container. In this example, we're using spark-submit to run the SparkPi example with 10 partitions.

You can also pass any other options or arguments that you would normally pass to spark-submit. For example, you can specify the path to your JAR file as an argument:

aws emr-containers submit-job \
--cluster arn:aws:eks:us-west-2:012345678910:cluster/my-emr-cluster \
--job-name my-job \
--execution-role-arn arn:aws:iam::012345678910:role/emr-containers \
--region us-west-2 \
--command '["/usr/bin/spark-submit", "--deploy-mode", "client", "--class", "com.example.MyJob", "s3://my-bucket/my-job.jar", "arg1", "arg2"]'

Note that the JAR file needs to be located in an S3 bucket that is accessible from your EMR Serverless cluster.

hash
beantwortet vor einem Jahr

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen