How to ensure EMR process steps in sequence?

0

I'm submitting multiple steps with AWS python SDK

steps = []
for job in jobs:
    args = [
        'spark-submit',
        '--py-files',
        's3://bucket/scripts/*',
        's3://bucket/scripts/main.py',
    ]
    args = args + job.params
    step = {
        'Name': job.name,
        'ActionOnFailure': 'CONTINUE',
        'HadoopJarStep': {
            'Jar': 'command-runner.jar',
            'Args': args
        }
    }
    steps.append(step)
response = self.client.add_job_flow_steps(JobFlowId=job_flow_id, Steps=steps)

But EMR does not process the step in the sequence as the steps array, is there a way to ensure it processes in sequence?

質問済み 2年前1030ビュー
1回答
0

I believe that steps are submitted and run in the order, so to confirm the same I went ahead and tested the same on a test cluster using your code with little changes as below.

steps = []
for i in range(1,10):
    args = [
        'spark-example',
        '--deploy-mode',
        'cluster',
        'SparkPi',
        '10'
    ]
    step = {
        'Name': "TestStepOrder" + str(i),
        'ActionOnFailure': 'CONTINUE',
        'HadoopJarStep': {
            'Jar': 'command-runner.jar',
            'Args': args
        }
    }
    steps.append(step)
response = client.add_job_flow_steps(JobFlowId=clusterId, Steps=steps)

I can confirm the order is maintained as expected.

I ran a second round of tests for the same with concurrency set as 5 to see if that has any impacts on this. In this case by looking at the Start Time I can confirm the order is still maintained.

Interested to know more about how you get the order mixed up, please share reproduction steps to reproduce the behavior you are observing.

Note: I'm using the latest boto3 version(1.20.26), not sure if that makes it any different

AWS
サポートエンジニア
回答済み 2年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ