If a lambda is consuming a FIFO queue and times out, why did my SQS message not get redrived or put into DLQ?

0

I'm trying to narrow down a message that seem to have disappeared. I have a very simple setup:

  1. FIFO SQS queue, VisibilityTimeout 1000, RedrivePolicy.maxReceiveCount: 2 + DLQ. Receiving a single MessageGroupId, ContentBasedDeduplication true
  2. Lambda BatchSize: 1, 900s timeout
1686766042995 START RequestId: 73fe8424-3542-5064-8e8e-d920314861e8 Logs MessageA
1686766064001 END RequestId: 73fe8424-3542-5064-8e8e-d920314861e8
1686766064031 START RequestId: d616b421-29c1-5baf-8896-e9252de7060e Logs MessageB
1686766965267 START RequestId: e31f6b71-9489-5b9e-b236-3675932898b8 Logs MessageC
1686766965299 END RequestId: d616b421-29c1-5baf-8896-e9252de7060e

I'm very confused here because:

  1. When e31f6b71-9489-5b9e-b236-3675932898b8 starts, just slightly over 900s has passed but the visibility timeout is 1000.

It could be that expected behavior here is that when lambda times out and gets terminated, it implicitly calls set visibility timeout on the message. (does it?)

  1. The event that e31f6b71-9489-5b9e-b236-3675932898b8 receives is completely different, but they have the same MessageGroupId.
  2. There are no events in the DLQ.
  3. The event is never seen again, and the content was seen 1.2 hours earlier.
  4. The Lambda monitoring shows it has an error, and there are no other lambdas executing.
  5. The lambda business logic never calls SQS directly

It's almost like the lambda handler succeeds when being marked as timedout, and then gets marked as an error.

1 Answer
0

A lot depends on the logic in your Lambda around when it calls DeleteMessage. You want to make sure that it's called if and only if processing of the message has been successful.

"It could be that expected behavior here is that when lambda times out and gets terminated, it implicitly calls set visibility timeout on the message" - no, that's not what happens. The idea is that if your Lambda times out without calling DeleteMessage then once the message's visibility timeout expires it will become visible on the queue again so it can be picked up for processing.

Given that your vis timeout ia 1000 but your 3rd Lambda call is only 900 seconds after the 2nd call, there's no way it could get the same message. Another call later could get the same message but only if DeleteMessage hasn't been called on it previously.

EXPERT
answered 10 months ago
  • My lambda doesn't have any logic at all that touches SQS including calling DeleteMessage, it's only receiving events. So how does a lambda that times out end up deleting a message?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions