Submit EMR serverless jobs from SageMaker notebook

0

I am processing a dataset and need to submit a job to EMR serverless for the dataset to be processed in a distributed way. I have created an application in EMR studio. I would like to submit jobs to that application. I found the command to submit jobs

aws emr-serverless start-job-run \
    --application-id application-id \
    --execution-role-arn job-role-arn \
    --job-driver '{
        "sparkSubmit": {
            "entryPoint": "s3://us-east-1.elasticmapreduce/emr-containers/samples/wordcount/scripts/wordcount.py",
            "entryPointArguments": ["s3://DOC-EXAMPLE-BUCKET-OUTPUT/wordcount_output"],
            "sparkSubmitParameters": "--conf spark.executor.cores=1 --conf spark.executor.memory=4g --conf spark.driver.cores=1 --conf spark.driver.memory=4g --conf spark.executor.instances=1"
        }
    }'

But how can I run the above command from a Python 3 Data Science Notebook in SageMaker studio. Basically what endpoint do I need to use to submit the job.

已提問 2 年前檢視次數 1573 次
2 個答案
0

Hello,

Instead of using the CLI to submit your job, have you tried using the boto3 Python library? https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr-serverless.html . All of the configuration parameters you've shared can be passed in EMRServerless boto3.

profile pictureAWS
專家
Chris_G
已回答 2 年前
0

Hello,

You can use below method to submit job for EMR serverless.

=>Running jobs from the EMR Studio console

=>Running jobs from the AWS CLI

https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/jobs.html

Submittion of EMR serverless jobs from SageMaker notebook is not supported yet.

AWS
支援工程師
已回答 2 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南