Submit EMR serverless jobs from SageMaker notebook

0

I am processing a dataset and need to submit a job to EMR serverless for the dataset to be processed in a distributed way. I have created an application in EMR studio. I would like to submit jobs to that application. I found the command to submit jobs

aws emr-serverless start-job-run \
    --application-id application-id \
    --execution-role-arn job-role-arn \
    --job-driver '{
        "sparkSubmit": {
            "entryPoint": "s3://us-east-1.elasticmapreduce/emr-containers/samples/wordcount/scripts/wordcount.py",
            "entryPointArguments": ["s3://DOC-EXAMPLE-BUCKET-OUTPUT/wordcount_output"],
            "sparkSubmitParameters": "--conf spark.executor.cores=1 --conf spark.executor.memory=4g --conf spark.driver.cores=1 --conf spark.driver.memory=4g --conf spark.executor.instances=1"
        }
    }'

But how can I run the above command from a Python 3 Data Science Notebook in SageMaker studio. Basically what endpoint do I need to use to submit the job.

質問済み 1年前1478ビュー
2回答
0

Hello,

Instead of using the CLI to submit your job, have you tried using the boto3 Python library? https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr-serverless.html . All of the configuration parameters you've shared can be passed in EMRServerless boto3.

profile pictureAWS
エキスパート
Chris_G
回答済み 1年前
0

Hello,

You can use below method to submit job for EMR serverless.

=>Running jobs from the EMR Studio console

=>Running jobs from the AWS CLI

https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/jobs.html

Submittion of EMR serverless jobs from SageMaker notebook is not supported yet.

AWS
サポートエンジニア
回答済み 1年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ