Submit EMR serverless jobs from SageMaker notebook

0

I am processing a dataset and need to submit a job to EMR serverless for the dataset to be processed in a distributed way. I have created an application in EMR studio. I would like to submit jobs to that application. I found the command to submit jobs

aws emr-serverless start-job-run \
    --application-id application-id \
    --execution-role-arn job-role-arn \
    --job-driver '{
        "sparkSubmit": {
            "entryPoint": "s3://us-east-1.elasticmapreduce/emr-containers/samples/wordcount/scripts/wordcount.py",
            "entryPointArguments": ["s3://DOC-EXAMPLE-BUCKET-OUTPUT/wordcount_output"],
            "sparkSubmitParameters": "--conf spark.executor.cores=1 --conf spark.executor.memory=4g --conf spark.driver.cores=1 --conf spark.driver.memory=4g --conf spark.executor.instances=1"
        }
    }'

But how can I run the above command from a Python 3 Data Science Notebook in SageMaker studio. Basically what endpoint do I need to use to submit the job.

2 Answers
0

Hello,

Instead of using the CLI to submit your job, have you tried using the boto3 Python library? https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr-serverless.html . All of the configuration parameters you've shared can be passed in EMRServerless boto3.

profile picture
MODERATOR
Chris_G
answered 2 months ago
0

Hello,

You can use below method to submit job for EMR serverless.

=>Running jobs from the EMR Studio console

=>Running jobs from the AWS CLI

https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/jobs.html

Submittion of EMR serverless jobs from SageMaker notebook is not supported yet.

SUPPORT ENGINEER
answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions