I have a setup where I'm using Celery as the task queue with Amazon SQS FIFO. My goal is to ensure sequential processing of tasks within the same message group ID, while allowing tasks with different message group IDs to be processed in parallel. However, despite following the recommended configurations and understanding the behavior of SQS message groups, I'm experiencing parallel processing of tasks within the same message group by multiple Celery worker processes. How can I ensure that tasks with the same message group ID are processed sequentially by a single worker process, while maintaining parallel processing for tasks with different message group IDs?
Some extra details (for reference) : For celery I haven't used --concurrency setting, so it is by default spawning 4 pool processes (no. of cores). I am passing message group id using following syntax :
` message_properties = {
"MessageGroupId": f"{supplier_id}"
}
celery_task.s(
param1, **message_properties
).apply_async(**message_properties)
`
and I have made sure the queue is fifo and it ends with .fifo . additional settings - { 'polling_interval': 60, 'wait_time_seconds': 10, 'visibility_timeout': 600 }
image SS
Thanks for your response. There does not seem to be a direct way to configure the batch size in celery. and setting worker to 1 will cause performance issues.
As per amazon sqs fifo documentation :
'When receiving messages from a FIFO queue with multiple message group IDs, Amazon SQS first attempts to return as many messages with the same message group ID as possible. This allows other consumers to process messages with a different message group ID. When you receive a message with a message group ID, no more messages for the same message group ID are returned unless you delete the message or it becomes visible.'
According to above description I understand that even with multiple consumers (multiple celery workers in my case) it should still work(honor message group ids). or am I missing out on something? Any help would be appreciated, thanks.