Submit EMR serverless jobs from SageMaker notebook

0

I am processing a dataset and need to submit a job to EMR serverless for the dataset to be processed in a distributed way. I have created an application in EMR studio. I would like to submit jobs to that application. I found the command to submit jobs

aws emr-serverless start-job-run \
    --application-id application-id \
    --execution-role-arn job-role-arn \
    --job-driver '{
        "sparkSubmit": {
            "entryPoint": "s3://us-east-1.elasticmapreduce/emr-containers/samples/wordcount/scripts/wordcount.py",
            "entryPointArguments": ["s3://DOC-EXAMPLE-BUCKET-OUTPUT/wordcount_output"],
            "sparkSubmitParameters": "--conf spark.executor.cores=1 --conf spark.executor.memory=4g --conf spark.driver.cores=1 --conf spark.driver.memory=4g --conf spark.executor.instances=1"
        }
    }'

But how can I run the above command from a Python 3 Data Science Notebook in SageMaker studio. Basically what endpoint do I need to use to submit the job.

gefragt vor 2 Jahren1596 Aufrufe
2 Antworten
0

Hello,

Instead of using the CLI to submit your job, have you tried using the boto3 Python library? https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr-serverless.html . All of the configuration parameters you've shared can be passed in EMRServerless boto3.

profile pictureAWS
EXPERTE
Chris_G
beantwortet vor 2 Jahren
0

Hello,

You can use below method to submit job for EMR serverless.

=>Running jobs from the EMR Studio console

=>Running jobs from the AWS CLI

https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/jobs.html

Submittion of EMR serverless jobs from SageMaker notebook is not supported yet.

AWS
SUPPORT-TECHNIKER
beantwortet vor 2 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen