How do I use another AWS service to submit an EMR Serverless job?
I want to use another AWS service to submit an Amazon EMR Serverless job.
Resolution
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshoot AWS CLI errors. Also, make sure that you're using the most recent AWS CLI version.
To use another AWS service to submit an EMR Serverless job, use the following methods:
AWS Step Functions
To use Step Functions to submit an EMR Serverless job, create a Step Functions state machine that submits the job as a step. The following is an example of a Step Functions state machine definition that submits an EMR Serverless job. Also, the state machine transitions to a success or failure state based on the job status that's received.
{ "Comment": "Submit an EMR Serverless job", "StartAt": "Submit EMR Serverless Job", "States": { "Submit EMR Serverless Job": { "Type": "Task", "Resource": "arn:aws:states:::aws-sdk:emr-serverless:startJobRun", "Parameters": { "ApplicationId": "example-application-id", "ExecutionRoleArn": "example-execution-role-arn", "JobDriver": { "SparkSubmitJobDriver": { "EntryPoint": "example-entry-point", "SparkSubmitParameters": "--class example-main-class --jars example-jar-paths" } }, "ConfigurationOverrides": { "MonitoringConfiguration": { "PersistentAppUI": "ENABLED" } } }, "Next": "Get Job Run Status" }, "Get Job Run Status": { "Type": "Task", "Resource": "arn:aws:states:::aws-sdk:emr-serverless:getJobRun", "Parameters": { "JobRunId.$": "$$.Task.Submit EMR Serverless Job.Output.JobRunId" }, "Next": "Job Run Succeeded?" }, "Job Run Succeeded?": { "Type": "Choice", "Choices": [ { "Variable": "$$.Task.Get Job Run Status.Output.JobRun.State", "StringEquals": "SUCCEEDED", "Next": "Success" } ], "Default": "Failure" }, "Success": { "Type": "Succeed" }, "Failure": { "Type": "Fail" } } }
Note: Replace example-application-id, example-execution-role-arn, example-entry-point, example-main-class, and example-jar-paths with your required values.
AWS SDKs
To submit jobs programmatically, use AWS SDKs to interact with the EMR Serverless API. The following is an example of how to use the AWS SDK for Python (Boto3) to submit an EMR Serverless job:
Note: The following Python script uses the Boto3 library to call the start_job_run method of the EMR Serverless API. Then, the job run ID that's returned by the API is printed.
import boto3 emr_serverless = boto3.client('emr-serverless') response = emr_serverless.start_job_run( ApplicationId='example-application-id', ExecutionRoleArn='example-execution-role-arn', JobDriver={ 'SparkSubmitJobDriver': { 'EntryPoint': 'example-entry-point', 'SparkSubmitParameters': '--class example-main-class --jars example-jar-paths' } }, ConfigurationOverrides={ 'MonitoringConfiguration': { 'PersistentAppUI': 'ENABLED' } } ) job_run_id = response['JobRunId'] print(f'Submitted EMR Serverless job with ID: {job_run_id}')
Note: Replace example-application-id, example-execution-role-arn, example-entry-point, example-main-class, and example-jar-paths with your required values.
AWS CLI
To use the AWS CLI to submit an EMR Serverless job, run the following start-job-run command:
aws emr-serverless start-job-run \ --application-id example-application-id \ --execution-role-arn example-execution-role-arn \ --job-driver '{"SparkSubmitJobDriver": {"EntryPoint": "example-entry-point", "SparkSubmitParameters": "--class example-main-class --jars example-jar-paths"}}' \ --configuration-overrides '{"MonitoringConfiguration": {"PersistentAppUI": "ENABLED"}}'
Note: Replace example-application-id, example-execution-role-arn, example-entry-point, example-main-class, and example-jar-paths with your required values.
AWS CloudFormation
AWS CloudFormation can be used to define and provision EMR Serverless resources. The following is an example of a CloudFormation template that creates an EMR Serverless application.
Note: The following CloudFormation template creates an EMR Serverless application with a specified release label and initial capacity.
Resources: EMRServerlessApplication: Type: AWS::EMRServerless::Application Properties: ReleaseLabel: emr-6.3.0 Type: SPARK InitialCapacity: WorkerCount: 2 WorkerConfiguration: CPU: '2vCPU' Memory: '8GB' EMRServerlessJobRun: Type: AWS::EMRServerless::JobRun Properties: ApplicationId: !Ref EMRServerlessApplication ExecutionRoleArn: example-execution-role-arn JobDriver: SparkSubmitJobDriver: EntryPoint: example-entry-point SparkSubmitParameters: '--class example-main-class --jars example-jar-paths' ConfigurationOverrides: MonitoringConfiguration: PersistentAppUI: ENABLED
Note: Replace example-execution-role-arn, example-entry-point, example-main-class, and example-jar-paths with your required values.
Related information
Orchestrate EMR Serverless jobs with AWS Step Functions
EMR Serverless samples on the GitHub website
相關內容
- AWS 官方已更新 3 個月前