Infinite retries due to exceeded SQS visibility timeout

0

I am using SQS for queuing messages that are processed by workers running in ECS task. I understand that if visibility timeout of SQS exceeds then same message will be visible to other workers which is causing an infinite loop and the queue gets stuck on the same message. Is there a way to limit the retries or unblock the other messages?

karan
asked 14 days ago79 views
2 Answers
1

Hello.

Basically, the application must be designed so that there is no problem even if the application processes SQS messages redundantly.
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/standard-queues-at-least-once-delivery.html

If this occurs, the copy of the message isn't deleted on that unavailable server, and you might get that message copy again when you receive messages. Design your applications to be idempotent (they should not be affected adversely when processing the same message more than once).

The Lambda documentation recommends setting the visibility timeout to 6 times the execution time.
Therefore, I thought it would be a good idea to set the visibility timeout to a fairly long period even if you are using ECS.
https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html

To allow your function time to process each batch of records, set the source queue's visibility timeout to at least six times the timeout that you configure on your function. The extra time allows for Lambda to retry if your function is throttled while processing a previous batch.

If you set up a dead letter queue and adjust the maximum number of times messages can be received in the standard queue, it may be possible to process them without duplication to some extent.
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-configure-dead-letter-queue.html

profile picture
EXPERT
answered 14 days ago
profile picture
EXPERT
reviewed 14 days ago
  • Thanks for the suggestion. The time taken by messages varies and can not set high visibility timeout. I want to use DLQ and reprocess the same message in the source queue with higher visibility timeout for that specific message. is there a standard way of doing this? My idea is to trigger a lambda when a message is moved to DLQ. The lambda will send the message back to source queue with higher visibility timeout.

0

You should make sure that your visibility timeout is longer than the processing time. The challenge is that if the consumer fails to process the message and you configured a long visibility, it will take long time for the message to be processed. In this case the best practice would be to set a shorter visibility timeout and extend it from the consumer while it is processing the message.

In addition, set a DLQ and set the max retries to a small number, so that even if you do not extend the timeout, you will not get into a loop.

profile pictureAWS
EXPERT
Uri
answered 14 days ago
  • Thanks for the suggestion. Changing the visibility timeout from the consumer is not in scope as celery tasks are performed. I want to use DLQ and reprocess the same message in the source queue with higher visibility timeout. is there a standard way of doing this? My idea is to trigger a lambda when a message is moved to DLQ. The lambda will send the message back to source queue with higher visibility timeout.

  • You can't specify a visibility timeout when sending a message, so this will not work. Why not use the higher default visibility timeout for the queue to start with so you process every message only once?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions