- Newest
- Most votes
- Most comments
-
Define catchers and retry policies in your state machine to handle different types of errors. For example, you can catch specific errors and decide whether to retry or proceed to a fallback state.
-
Use a catch Lambda function to process the error payload and determine the type of error (e.g., business validation or dependency outage). Based on the error type, you can decide the next steps.
-
Use a Choice state after the catch Lambda to determine the flow based on the error type. If the error is due to a dependency outage, you can redrive from the failed state. If the error is due to business validation, you can fail the execution or proceed as needed.
-
Pass the state information and error details to the catch Lambda so that it knows which state failed and why. This can be done by including the $.Execution.Input and $.State.Name in the catch handler.
-
Implement a manual intervention mechanism to redrive from a particular state. This can be achieved using an external system or a Step Functions callback pattern where you trigger a new execution starting from the failed state with the required input.
Here is an example of how to structure your state machine JSON definition:
{
"Comment": "A sample state machine to demonstrate redrive logic",
"StartAt": "CreateCandidateId",
"States": {
"CreateCandidateId": {
"Type": "Task",
"Resource": "arn:aws:lambda:region:account-id:function:createCandidateId",
"Catch": [
{
"ErrorEquals": ["States.ALL"],
"ResultPath": "$.error",
"Next": "CatchLambda"
}
],
"End": true
},
"CatchLambda": {
"Type": "Task",
"Resource": "arn:aws:lambda:region:account-id:function:catchLambda",
"ResultPath": "$.catchResult",
"Next": "ErrorHandler"
},
"ErrorHandler": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.catchResult.errorType",
"StringEquals": "BusinessValidationError",
"Next": "BusinessValidationState"
},
{
"Variable": "$.catchResult.errorType",
"StringEquals": "DependencyOutage",
"Next": "RetryCreateCandidateId"
}
],
"Default": "FailState"
},
"RetryCreateCandidateId": {
"Type": "Task",
"Resource": "arn:aws:lambda:region:account-id:function:createCandidateId",
"Retry": [
{
"ErrorEquals": ["States.ALL"],
"IntervalSeconds": 10,
"MaxAttempts": 3,
"BackoffRate": 2.0
}
],
"Catch": [
{
"ErrorEquals": ["States.ALL"],
"ResultPath": "$.error",
"Next": "FailState"
}
],
"End": true
},
"BusinessValidationState": {
"Type": "Pass",
"Result": "Business validation passed, continuing workflow",
"End": true
},
"FailState": {
"Type": "Fail",
"Error": "WorkflowFailed",
"Cause": "State machine execution failed due to an error"
}
}
}
Explanation:
- CreateCandidateId: This is the initial state that might fail.
- CatchLambda: This Lambda function processes the error payload.
- ErrorHandler: This Choice state determines the next steps based on the error type.
- RetryCreateCandidateId: This state retries the CreateCandidateId task if the error is due to a dependency outage.
- BusinessValidationState: This state handles business validation errors.
- FailState: This state is reached if the error cannot be handled.
Hello Oleksii,
From the First Catch Lambda, I will be pushing the message to SQS (Dead Letter Queue) with all the required details for redrive, such as stateName, executionId, executionArn, and stateInput. After I fix the issue, like a code bug or anything else that requires manual intervention, I want to perform redrive on this Dead Letter Queue, which will push the message into the main SQS. This main SQS will trigger a Lambda function, which will redrive the state machine.
My concern is that there will be two failed states: one is the Fail State, and the second can be any TaskState. I inquire whether it is possible to perform redrive from a TaskState because when I was redriving using the AWS console, it was redriving through the Fail State.
Relevant content
- asked 3 years ago
- asked 6 months ago
- asked 2 years ago
- asked 2 years ago
- AWS OFFICIALUpdated 10 months ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 9 months ago
- AWS OFFICIALUpdated 2 years ago
please accept the answer if it was useful