How to ensure EMR process steps in sequence?

0

I'm submitting multiple steps with AWS python SDK

steps = []
for job in jobs:
    args = [
        'spark-submit',
        '--py-files',
        's3://bucket/scripts/*',
        's3://bucket/scripts/main.py',
    ]
    args = args + job.params
    step = {
        'Name': job.name,
        'ActionOnFailure': 'CONTINUE',
        'HadoopJarStep': {
            'Jar': 'command-runner.jar',
            'Args': args
        }
    }
    steps.append(step)
response = self.client.add_job_flow_steps(JobFlowId=job_flow_id, Steps=steps)

But EMR does not process the step in the sequence as the steps array, is there a way to ensure it processes in sequence?

질문됨 2년 전1030회 조회
1개 답변
0

I believe that steps are submitted and run in the order, so to confirm the same I went ahead and tested the same on a test cluster using your code with little changes as below.

steps = []
for i in range(1,10):
    args = [
        'spark-example',
        '--deploy-mode',
        'cluster',
        'SparkPi',
        '10'
    ]
    step = {
        'Name': "TestStepOrder" + str(i),
        'ActionOnFailure': 'CONTINUE',
        'HadoopJarStep': {
            'Jar': 'command-runner.jar',
            'Args': args
        }
    }
    steps.append(step)
response = client.add_job_flow_steps(JobFlowId=clusterId, Steps=steps)

I can confirm the order is maintained as expected.

I ran a second round of tests for the same with concurrency set as 5 to see if that has any impacts on this. In this case by looking at the Start Time I can confirm the order is still maintained.

Interested to know more about how you get the order mixed up, please share reproduction steps to reproduce the behavior you are observing.

Note: I'm using the latest boto3 version(1.20.26), not sure if that makes it any different

AWS
지원 엔지니어
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠