EMR Serverless submitting a job using command-runner.jar?

0

Hi! I'm excited to see EMR Serverless. I went through the quick start and was able to run the word count, pretty cool.

Now I would like to submit a spark job using command-runner.jar like I do on a regular EMR Cluster but when submitting a job EMR Studio asks for the script S3 URI. It is not a script it is a JAR and in a regular EMR cluster I use command-runner.jar to call a spark-submit with another jar.

I looked for documentation but this all seems pretty new and couldn't find any. How can I submit a job using command-runner.jar? Or do I submit the job directly?

Any tips would be highly appreciated!

posta un anno fa1128 visualizzazioni
1 Risposta
0

In EMR Serverless, you can submit a Spark job using the aws emr-containers submit-job command. This command submits a job to run in a container on an EMR-managed infrastructure. Here is an example command:

aws emr-containers submit-job \
--cluster arn:aws:eks:us-west-2:012345678910:cluster/my-emr-cluster \
--job-name my-job \
--execution-role-arn arn:aws:iam::012345678910:role/emr-containers \
--region us-west-2 \
--command '["/usr/bin/spark-submit", "--deploy-mode", "client", "--class", "org.apache.spark.examples.SparkPi", "local:///usr/lib/spark/examples/jars/spark-examples.jar", "10"]'

In the --command option, you can specify the command you want to run inside the container. In this example, we're using spark-submit to run the SparkPi example with 10 partitions.

You can also pass any other options or arguments that you would normally pass to spark-submit. For example, you can specify the path to your JAR file as an argument:

aws emr-containers submit-job \
--cluster arn:aws:eks:us-west-2:012345678910:cluster/my-emr-cluster \
--job-name my-job \
--execution-role-arn arn:aws:iam::012345678910:role/emr-containers \
--region us-west-2 \
--command '["/usr/bin/spark-submit", "--deploy-mode", "client", "--class", "com.example.MyJob", "s3://my-bucket/my-job.jar", "arg1", "arg2"]'

Note that the JAR file needs to be located in an S3 bucket that is accessible from your EMR Serverless cluster.

hash
con risposta un anno fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande