SQS batching S3 events issue

0

Hi,

Use Case Architecture: S3 events => SQS Standard Queue => Lambda

Use Case : Files dropped to a specific bucket around a specific interval (seconds) to be processed by a single Lambda execution as a batch.

Issue: No matter the configurations when more than one file is dropped the s3 events are not processed by the same lambda execution. There is always an ms (1-2) gap when a new execution occurs.

Configurations:

  • Lambda Function:
    • Concurrency: 1
    • Memory: 10240MB
    • Timeout 15min0sec
  • Lambda SQS Trigger:
    • Batch size: 10
    • Batch window: 300
  • SQS Queue:
    • Receive message wait time: 20 (tried various polling wait times)
    • SecondsDefault visibility timeout: 20 Minutes,
    • Delivery delay: 0 (tried various delay times)

Execution Logs:

  • Number of S3 events: 5
  • Lambda Executions:
    • Max Memory Used: 71-72 MB
    • Duration: 45-211ms
    • Number of Executions: 3-4

Based on the logs it can handle at most 2 messages per execution. The lambda code is very straightforward and not a heavy process.

2 Answers
1

S3 and SQS are distributed systems. When you upload multiple files at the same time, it is improbable that they are processed by the same resources. Similarly, the event propagation to SQS is also a distributed system and the events are not likely to be sent to SQS at the same time. In a third similarity, SQS is a distributed system and the number of the available messages will not be the same to all end points.

What is the use case you are trying to solve?

profile pictureAWS
answered 2 years ago
  • Hi Rodney,

    Thanks for the response, the use case relies on files dropped in a bucket around a specific interval to be processed as a single batch/step in an EMR triggered by a lambda (1 producer S3 events, 1 consumer Lambda/EMR). We were relying on the Lambda batch capabilities to collect those events from SQS and invoke the EMR in a single step but that does not look possible judging from your response.

0

Reading Rodney answer I think it's valid for short polling.

Looking at this doc, when you have long polling, it should work as you need.

But I remember that I had a similar issue and there was an explanation why it is not always collecting all messages. I don't remember, unfortunately.

But still looking at doc it should work as you expect.

profile picture
MG
answered 2 years ago
  • Hi MG,

    I have tried both short and long polling without success, according to Rodney it cant accommodate batching of parellel events which is really a shame.

  • Ok yeah, now I remember :)

    While the regular short polling returns immediately, even if the message queue being polled is empty, long polling doesn’t return a response until a message arrives in the message queue, or the long poll times out.

    So basically it is impossible as Rodney wrote, you are right. It is because even you are using long polling, the message is returned immediately, the difference is only that the ReceiveMessage is checking all Queues in SQS Service and waiting for at least 1 message to return for a set time.

  • The thing is that the batch window is set to 300 seconds so it would be expected that it would pick all the messages from the Queue during that window but its not working like that unfortunately. Thanks MG for the help though :)

  • Ok, got it!

    Lambda processes up to five batches at a time. This means that there are a maximum of five workers available to batch and process messages in parallel at any one time. Each worker shows a distinct Lambda invocation for its current batch of messages.

    From here.

    That's why it is spread between batches.

  • I dont believe its a lambda issue (even with concurrency set to one, batches arrive subsequently) as we tried polling through an EMR and parallel events arrived in seperate batches again.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions