SendMessageBatch API

0

Hello,

I am using the java sdk and using sendMessageBatch to send messages to a FIFO queue. I am trying to understand exactly how the messages are enqueued into the FIFO queue. More specifically, I want to know if these messages all enter the queue and become visible at the same time or if there they are posted one by one behind the scenes. I have gone through as much documentation as possible and have not found the answer, the closest I have read was this:

"For a FIFO queue, multiple messages within a single batch are enqueued in the order they are sent."

from the documentation here: https://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_SendMessageBatch.html

Unfortunately, this doesn't give me the information I am looking for and hoping someone may be able to help with this!

Thank you,

Brandon

asked 5 years ago460 views
3 Answers
0
Accepted Answer

You are right, I can confirm that currently if the following conditions are met for an SQS FIFO queue:

  1. The queue has no messages available to be returned (either empty or all messages are in flight)
  2. There is a long-poll ReceiveMessage call already waiting on the queue
  3. A batch of messages is sent to the FIFO queue
    then the existing long-poll ReceiveMessage call will complete immediately and return only first message of the sent batch. If there was another long-poll ReceiveMessage calls pending as well, it will return the remaining messages of the batch (but only if they had a different message group id - the rules of FIFO ordering still apply).

This behavior will affect the scenario you are trying to do to minimize the number of calls to SQS. I've added a request to change this behavior to the team's backlog.

Kuba
answered 5 years ago
0

Please provide more info about your use case, specifically why would this make a difference?

The behavior of batch sends vs messages becoming visible is more closely tied to what message group ids do you assign to messages in the batch.

If all messages in the batch have the same message group id, then only the first message becomes available to be received.

If all messages in the batch have a different message group id, then all messages are available to be received.

The above assumes the queue is empty. If there were already messages in the queue, the above behavior depends on whether you had invisible messages for those message group ids.

Kuba
answered 5 years ago
0

Sure here are some additional details. For the FIFO queue, we are using content based deduplication but are not using a Deduplication Id.

For out problem, we are trying to efficiently calculate a specific value on an account in our DB (Outside AWS).

For our infrastructure, we have two lambdas (one with a regular SQS queue as a trigger) and a FIFO queue. The flow works something along these lines:

  • An event is published from our system with a list of account Ids that need to be calculated
  • This is published to the SQS Queue which triggers the first lambda (Let's call this the eventHandler)
  • The eventHandler creates a new event for each account Id to be processed and publishes these to the FIFO queue (using sendMessageBatch in batches of up to 10)
  • After the messages are published, the eventHandler asynchronously invokes the second lambda (Let's call this the CalculationHandler) which can pull up to 190 events from the FIFO queue
  • The CalculationHandler does some processing (making API calls into our system), gets a final value, and updates the account
  • The CalculationHandler will then poll the FIFO queue and if no messages exist, it will finish

As a side note, we have concurrency set to be larger than 1 so multiple lambdas can be invoked at once.

The reason the FIFO queue is important here is because if 3 events come in for the same account and they are all sitting in the FIFO queue, we only need to do the calculation once (this is why we use content based-deduplication). Deduplication will not work because of the 5 minute wait after the account has been processed to avoid duplicates being published.

I am attempting to get an idea on the number of API calls the CalculationHandler does back into our system given certain metrics. However, what I have noticed in some scenarios is that 9 events get published (set to be 5 seconds apart), these get published to the FIFO queue in 9 separate batches (I configured the size of the accounts to be under 10 so that each event would be published in one batch), but the CalculationHandler pulls messages from the queue 16 times (the numbers add up to the same number of accounts published but the size of the messages pulled are smaller than the messages published). This leads me to believe that when calling sendMessageBatch, not all messages become available in the queue at the same time leading to more messages being pulled than initially being published and resulting in higher API calls to our system.

I was hoping to get clarification on whether sendMessageBatch would make all messages visible at the same time or if there could be a delay in some messages?

Back to your point, in this use case there should be no messages in the same batch that have the same message group id. Any messages with the same message group Id that would already be sitting in the queue would have the same content and should be removed through content-based deduplication.

answered 5 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions