How to ensure EMR process steps in sequence?

0

I'm submitting multiple steps with AWS python SDK

steps = []
for job in jobs:
    args = [
        'spark-submit',
        '--py-files',
        's3://bucket/scripts/*',
        's3://bucket/scripts/main.py',
    ]
    args = args + job.params
    step = {
        'Name': job.name,
        'ActionOnFailure': 'CONTINUE',
        'HadoopJarStep': {
            'Jar': 'command-runner.jar',
            'Args': args
        }
    }
    steps.append(step)
response = self.client.add_job_flow_steps(JobFlowId=job_flow_id, Steps=steps)

But EMR does not process the step in the sequence as the steps array, is there a way to ensure it processes in sequence?

gefragt vor 2 Jahren1030 Aufrufe
1 Antwort
0

I believe that steps are submitted and run in the order, so to confirm the same I went ahead and tested the same on a test cluster using your code with little changes as below.

steps = []
for i in range(1,10):
    args = [
        'spark-example',
        '--deploy-mode',
        'cluster',
        'SparkPi',
        '10'
    ]
    step = {
        'Name': "TestStepOrder" + str(i),
        'ActionOnFailure': 'CONTINUE',
        'HadoopJarStep': {
            'Jar': 'command-runner.jar',
            'Args': args
        }
    }
    steps.append(step)
response = client.add_job_flow_steps(JobFlowId=clusterId, Steps=steps)

I can confirm the order is maintained as expected.

I ran a second round of tests for the same with concurrency set as 5 to see if that has any impacts on this. In this case by looking at the Start Time I can confirm the order is still maintained.

Interested to know more about how you get the order mixed up, please share reproduction steps to reproduce the behavior you are observing.

Note: I'm using the latest boto3 version(1.20.26), not sure if that makes it any different

AWS
SUPPORT-TECHNIKER
beantwortet vor 2 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen