Step function and EMR server less integration

1

I am trying to invoke a spark script that runs over EMR serverless from a Step function. Nevertheless, I have not found an out of the box integration. One option is to use a lambda function that submit the spark job through Boto3, however, this solution does not allow the step function waits until the end of the spark job. so, how can I submit the spark job and block the step function until the job finalize?

1 Answer
1

If you are using a standard workflow, you can invoke the Lambda function with .waitForTaskToken integration. This will allow you to pass a task token to the Lambda function. Unless you specify a task timeout or heartbeat timeout, you can pause Step Functions indefinitely, and wait for an external process or workflow to complete. Upon invoking a spark script from Lambda, you can call a SendTaskSuccess or SendTaskFailure API depending on the result of the spark script (or however you like) to let Step Functions know the task status.

Standard Step Functions workflow -> Lambda task state (.waitForTaskToken) -> Lambda (call Spark/SendTaskSuccess | SendTaskFailure)) # this workflow waits until the Lambda function calls one of the SendTask* APIs

{  
   "StartAt":"GetManualReview",
   "States":{  
      "GetManualReview":{  
         "Type":"Task",
         "Resource":"arn:aws:states:::lambda:invoke.waitForTaskToken",
         "Parameters":{  
            "FunctionName":"get-model-review-decision",
            "Payload":{  
               "model.$":"$.new_model",
               "token.$":"$$.Task.Token"
            },
            "Qualifier":"prod-v1"
         },
         "End":true
      }
   }
}
AWS
Taka_M
answered 2 years ago
  • Hi @Taka_M, Thanks for your answer, the proposed solution looks great. Just I have a cuple of question, what if does the spark script takes more than the lambda timeout ( 15 minutes )? is there a way to react to a EMR Serverles failure event? Thanks for all your support.

  • You can retry the Lambda task if it times out by adding a retry for the task. A better way to do that is doing it async. If you can possibly make that SendTask* API call from your EMR cluster, then your Lambda won't need to wait just to get the spark job status. You still pass the task token to Lambda but Lambda passes it to the spark job when it starts. If it's not feasible, you can just invoke a Lambda function and then have a wait state right after (if you know it takes x amount of time). And you can have another lambda task state to check the status. You can put it in a loop using a choice state (Lambda to kick off a spark job -> wait state -> lambda to check the job -> choice (to determine if the job is done, failed, need more time) # repeat it as needed. or you can use a retry field for the last Lambda and add the max retry count (return a failure if it needs more time for the job).

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions