polling from SQS containing 1 million messages without running out of lambda concurrency allowance.

0

Hi, We have a sqs normal queue (not fifo queue) which the producer will publish thousands to millions message into, so the live message is constantly about 5k - 1million

we are using python boto3 receive_messaage to poll message. message = sqs.receive_message( QueueUrl=queue_url, AttributeNames=[ 'SentTimestamp' ], MessageAttributeNames=[ 'All' ], VisibilityTimeout=120, MaxNumberOfMessages=MAX_RECEIVE_MESSAGES, WaitTimeSeconds=MAX_WAIT_TIME_SECS )

To consume the message

  • We had a lambda subscribe to the queue and start processing the messages, the lambda does not call other API just poll messages, based on a certain values of certain field, for instance student_last_name="Smith" then publish/re-route to different SNS topics. In this design pattern with millions message going on, won't we run out of lambda concurrent instance limit very soon? We actually had another similar queue where a similar lambda takes about 4 seconds to finish (it is a more complicated lambda interact with other systems' APIs), and we quickly ran out of lambda instances.
  • Another design is having the lambda triggered by cloudwatch event rule every 1 min, the lambda will use the mentioned python receive_message call by setting MaxNumberOfMessages=10 (max allowance), and calling receive_message in a loop. The loop only exit either runs a certain times or collection around 10000 messages then start processing these 10000 messages , finally route them to different sns topics.

both of these design seems not perfect for us. the first one, has its risk and the second one only allows us to process every 1 min. I wonder AWS architect and expert could provide further guidance? Thank you very much.

asked a year ago648 views
1 Answer
2

You should use the SQS Lambda trigger to consume the messages from the queue. Concurrency will auto scale between 0 to 1000, based on number of messages in the queue. If you have other functions in the account and you are concerned that the function's concurrency will prevent other functions from running, you have 2 options:

  1. Request a limit increase.
  2. Use Reserved concurrency on the function to limit its maximum concurrency.

I would recommend the first option, unless you want to limit the concurrency because the function calls some downstream service which has limited resources.

profile pictureAWS
EXPERT
Uri
answered a year ago
profile pictureAWS
EXPERT
reviewed a year ago
  • appreciate the response. "the function calls some downstream service which has limited resources", do you mean the functionality allows us to limit instances of a particular lambda calls some downstream service? Can you please give more details about such "downstream service" as well as "limited resources"? thank you!

  • An example can be a downstream database, used by the Lambda function, that can accept up to 100 connections. If you let the function scale, it may get to 1000 concurrency, which will create 1000 connections. To limit that, you can set Reserved Concurrency on your function to 100, which will make sure that you will never have more than 100 concurrent invocations. (Specifically in the case of the database, You may use RDS Proxy to overcome this without limiting concurrency)

  • Hi, I am working on a similar requirement where I am expecting 3 million JSON messages per day each of approx 2KB. These messages needs to be stored on S3 for further processing and then move to Snowflake for reporting.

    Will Lambda with concurrency be enough for processing? Scheduled or triggered? Does AWS SDK support reading messages in a batch of 10000 from standard queue?

    Any suggestions/recommendations are welcome.

    Thanks, Satyen.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions