- Newest
- Most votes
- Most comments
There are a few potential reasons why SQS messages might be moving to the Dead Letter Queue (DLQ) without being delivered to your Lambda function:
-
Distributed system inconsistencies: Amazon SQS is a distributed system, and in rare cases, it's possible for a message to be marked as delivered even when the consumer (in this case, your Lambda function) hasn't actually received it. This can happen due to network issues or other temporary failures. When this occurs, SQS records the message as delivered at least once, but your Lambda function never processes it.
-
Low maxReceiveCount setting: If the maxReceiveCount value in your redrive policy is set too low (e.g., 1 or 2), messages might be moved to the DLQ before Lambda has a chance to process them successfully. It's recommended to set this value higher to allow for multiple processing attempts.
-
Visibility timeout issues: If your Lambda function takes longer to process messages than the visibility timeout set on your queue, other consumers might receive the same messages, potentially leading to duplicate processing or messages being moved to the DLQ.
-
Lambda concurrency limits: If your Lambda function is hitting its concurrency limits, it might not be able to process all incoming messages, causing some to be moved to the DLQ.
-
Lambda errors not properly caught: If your Lambda function is encountering errors that are not being properly logged or handled, it might appear as though the messages were never received.
To troubleshoot this issue:
- Increase the maxReceiveCount in your redrive policy to allow for more processing attempts.
- Ensure your visibility timeout is set to at least 6 times the Lambda function's timeout.
- Check your Lambda function's concurrency settings and consider increasing them if necessary.
- Implement comprehensive error handling and logging in your Lambda function to catch any potential issues.
- Monitor your Lambda function's performance metrics, including duration, to ensure it's not timing out.
If the issue persists after implementing these changes, you may need to engage AWS support for further investigation, as there could be underlying issues with the SQS-Lambda integration or your specific configuration.
Sources
Understand the Amazon SQS dead-letter queue | AWS re:Post
Avoiding inconsistent message processing in Amazon SQS - Amazon Simple Queue Service
Using dead-letter queues in Amazon SQS - Amazon Simple Queue Service
Understanding SQS retries - AWS Lambda
I had a similar experience and 'solved' it by switching over to using a FIFO SQS queue. I could not escape this issue with the standard queue where some items would disappear into "in flight" without a Lambda instance actually receiving them. Visibility timeout would occur and they'd end up in the dead letter queue.
My use case is to place a bunch of items quickly on the queue, and then I wanted Lambda to process a maximum of five batches concurrently of five items each. I used a maximum concurrency of five on the SQS event source triggering my Lambda, and this was to throttle how fast I was hitting a downstream service called by the Lambda. The queue's visibility timeout was twenty minutes. It would spin up five Lambda instances just fine, and they ran for about sixty seconds each to process the five items in the batch they received. The Lambda indicated success by returning an empty set of batch item failures and the items would be deleted from the queue. When it started reaching the end of the available queue items, the dequeued items disappear into "in flight" status and then either the dead-letter queue (if configured) or eventual successful retry (if no dead-letter) after the visibility timeout. It was clearly related to how long my Lambda took to run (which was set to fifteen minute execution timeout.) If it processed a batch in ten seconds, I never encountered the issue. I could not find any real guidance around how to handle this backpressure issue other than process faster (not possible) or process more in parallel (not desirable.) FIFO seems to act how I want it to.
Relevant content
- asked 2 years ago
- AWS OFFICIALUpdated 10 months ago
- AWS OFFICIALUpdated 3 years ago
