How do I use another AWS service to submit an EMR Serverless job?

3 minuto de leitura
1

I want to use another AWS service to submit an Amazon EMR Serverless job.

Resolution

Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshoot AWS CLI errors. Also, make sure that you're using the most recent AWS CLI version.

To use another AWS service to submit an EMR Serverless job, use the following methods:

AWS Step Functions

To use Step Functions to submit an EMR Serverless job, create a Step Functions state machine that submits the job as a step. The following is an example of a Step Functions state machine definition that submits an EMR Serverless job. Also, the state machine transitions to a success or failure state based on the job status that's received.

{
  "Comment": "Submit an EMR Serverless job",
  "StartAt": "Submit EMR Serverless Job",
  "States": {
    "Submit EMR Serverless Job": {
      "Type": "Task",
      "Resource": "arn:aws:states:::aws-sdk:emr-serverless:startJobRun",
      "Parameters": {
        "ApplicationId": "example-application-id",
        "ExecutionRoleArn": "example-execution-role-arn",
        "JobDriver": {
          "SparkSubmitJobDriver": {
            "EntryPoint": "example-entry-point",
            "SparkSubmitParameters": "--class example-main-class --jars example-jar-paths"
          }
        },
        "ConfigurationOverrides": {
          "MonitoringConfiguration": {
            "PersistentAppUI": "ENABLED"
          }
        }
      },
      "Next": "Get Job Run Status"
    },
    "Get Job Run Status": {
      "Type": "Task",
      "Resource": "arn:aws:states:::aws-sdk:emr-serverless:getJobRun",
      "Parameters": {
        "JobRunId.$": "$$.Task.Submit EMR Serverless Job.Output.JobRunId"
      },
      "Next": "Job Run Succeeded?"
    },
    "Job Run Succeeded?": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$$.Task.Get Job Run Status.Output.JobRun.State",
          "StringEquals": "SUCCEEDED",
          "Next": "Success"
        }
      ],
      "Default": "Failure"
    },
    "Success": {
      "Type": "Succeed"
    },
    "Failure": {
      "Type": "Fail"
    }
  }
}

Note: Replace example-application-id, example-execution-role-arn, example-entry-point, example-main-class, and example-jar-paths with your required values.

AWS SDKs

To submit jobs programmatically, use AWS SDKs to interact with the EMR Serverless API. The following is an example of how to use the AWS SDK for Python (Boto3) to submit an EMR Serverless job:

Note: The following Python script uses the Boto3 library to call the start_job_run method of the EMR Serverless API. Then, the job run ID that's returned by the API is printed.

import boto3

emr_serverless = boto3.client('emr-serverless')

response = emr_serverless.start_job_run(
    ApplicationId='example-application-id',
    ExecutionRoleArn='example-execution-role-arn',
    JobDriver={
        'SparkSubmitJobDriver': {
            'EntryPoint': 'example-entry-point',
            'SparkSubmitParameters': '--class example-main-class --jars example-jar-paths'
        }
    },
    ConfigurationOverrides={
        'MonitoringConfiguration': {
            'PersistentAppUI': 'ENABLED'
        }
    }
)

job_run_id = response['JobRunId']
print(f'Submitted EMR Serverless job with ID: {job_run_id}')

Note: Replace example-application-id, example-execution-role-arn, example-entry-point, example-main-class, and example-jar-paths with your required values.

AWS CLI

To use the AWS CLI to submit an EMR Serverless job, run the following start-job-run command:

aws emr-serverless start-job-run \
    --application-id example-application-id \
    --execution-role-arn example-execution-role-arn \
    --job-driver '{"SparkSubmitJobDriver": {"EntryPoint": "example-entry-point", "SparkSubmitParameters": "--class example-main-class --jars example-jar-paths"}}' \
    --configuration-overrides '{"MonitoringConfiguration": {"PersistentAppUI": "ENABLED"}}'

Note: Replace example-application-id, example-execution-role-arn, example-entry-point, example-main-class, and example-jar-paths with your required values.

AWS CloudFormation

AWS CloudFormation can be used to define and provision EMR Serverless resources. The following is an example of a CloudFormation template that creates an EMR Serverless application.

Note: The following CloudFormation template creates an EMR Serverless application with a specified release label and initial capacity.

Resources:
  EMRServerlessApplication:
    Type: AWS::EMRServerless::Application
    Properties:
      ReleaseLabel: emr-6.3.0
      Type: SPARK
      InitialCapacity:
        WorkerCount: 2
        WorkerConfiguration:
          CPU: '2vCPU'
          Memory: '8GB'

  EMRServerlessJobRun:
    Type: AWS::EMRServerless::JobRun
    Properties:
      ApplicationId: !Ref EMRServerlessApplication
      ExecutionRoleArn: example-execution-role-arn
      JobDriver:
        SparkSubmitJobDriver:
          EntryPoint: example-entry-point
          SparkSubmitParameters: '--class example-main-class --jars example-jar-paths'
      ConfigurationOverrides:
        MonitoringConfiguration:
          PersistentAppUI: ENABLED

Note: Replace example-execution-role-arn, example-entry-point, example-main-class, and example-jar-paths with your required values.

Related information

Running jobs

Run an EMR Serverless job

Orchestrate EMR Serverless jobs with AWS Step Functions

EMR Serverless samples on the GitHub website

AWS OFICIAL
AWS OFICIALAtualizada há 3 meses