Is this a bug in lambda or SQS recursive loop detection?

0

After reading https://aws.amazon.com/blogs/compute/detecting-and-stopping-recursive-loops-in-aws-lambda-functions/ , I understand that a batch of SQS messages will be dropped if any lineage value is greater than 16. However, when testing this out, I find that any messages in an SQS batch take on the maximum value of any lineage in the batch. Is this supposed to happen? It means that some messages can be dropped prematurely if they are unlucky to end up in a batch with a higher lineage value.

For example, say a system is designed in a way such that messages loop through a series of queues and lambdas 10 times under normal operation. A message that's just started its journey may take on a lineage of an earlier message, and thus get dropped prematurely.

Here's a reproduction: https://github.com/uozuAho/aws_recursion_bug

Warwick
已提问 1 个月前167 查看次数
2 回答
0
已接受的回答

The answer I got from enterprise AWS support is that this is expected behaviour. No explanation as to why, unfortunately.

Warwick
已回答 21 天前
0

You make a good point. When SQS batches messages for processing, it is possible for messages with lower lineage values to take on the maximum lineage of the batch, potentially causing premature dropping.

There are some thing you can try when setting up SQS.

  • Keeping batch sizes small to a target of 5 or less may help.
  • Using message groups in FIFO queues to further isolate message processing.
  • Making message processing idempotent so it doesn't matter if a message is processed multiple times.
  • Consider adding more queues or sharding to split message load across multiple queues.
  • Increasing visibility timeout to reduce frequency of polling. Giving processes more time may alleviate some batching issues.
  • Using dead letter queues (DLQ) to hold and reprocess failed messages separately from the main queue.

I hope this helps.

profile picture
专家
已回答 1 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则

相关内容