Skip to content

How can I properly scale an ECS with SQS Standard?

0

Hello, nice to meet you. My name is Andrew.

On this occasion, I'm looking for help or guidance on how to autoscale an ECS container in Fargate. I've already read the documentation and understand how to do it and what mechanisms exist for scaling, but I'd like to present my case here to see what you think and if, based on your experience, you can suggest a better way to scale my application.

My application has a maximum message processing time of 23 seconds, and on average, each request can last up to 5 seconds. These messages are read from a standard SQS queue. The concurrency is set to MAX_NUMBER_OF_MESSAGES=10 and MAX_CONCURRENT_PROCESS=20, with a total of 200 concurrent messages.

In the documentation, I've used visible messages, oldest message, sent messages, and received messages to generate scaling policies, but I haven't obtained positive results.

  1. I'm not entirely sure which metric is appropriate for scaling this type of application with high latency.

  2. Scaling is working, but I have over-provisioning, which translates into significant costs.

  3. Descaling is too abrupt and doesn't stabilize, causing message loss in flight when descaled tasks are inactive.

As additional information, I'm using 15 tasks on machines with 2 vCPUs and 4GB of RAM.

Can anyone help me, please?

1 Answer
1

Hello Andrew, nice to meet you! I'd be happy to help you optimize your ECS autoscaling configuration for your SQS-based workload.

Based on your scenario, the key issue is that standard SQS metrics like visible messages don't account for the actual processing capacity of your containers. For your application with variable processing times (5-23 seconds) and specific concurrency limits, you need a more sophisticated approach.

The Right Metric: Backlog Per Instance

Instead of using raw SQS metrics, you should create a custom CloudWatch metric called "backlog per instance" (or backlog per task). This is calculated by dividing the number of messages in your queue by the number of running ECS tasks. This metric gives you a much more accurate picture of whether you need to scale.

To determine your target value for this metric, use this formula:

  • Acceptable backlog per task = (Acceptable latency) / (Average processing time per message)

For example, if you can tolerate 60 seconds of latency and your average processing time is 5 seconds, your target would be 12 messages per task. However, given your concurrency settings (200 concurrent messages total), you'll want to factor in your MAX_CONCURRENT_PROCESS value when setting this target.

Addressing Your Specific Issues:

  1. Choosing the right metric: Request concurrency or backlog per task is more appropriate than simple queue depth for applications with variable processing times. You should have your application publish custom metrics to CloudWatch showing the actual concurrent requests being processed. This can be reported at least once per minute, and you can use the average concurrency across all replicas as your scaling metric.

  2. Over-provisioning: This happens because standard queue metrics don't reflect your actual processing capacity. By switching to a backlog-per-task metric with target tracking, the autoscaler will more accurately match capacity to demand. For workloads where processing time varies widely, you can use CloudWatch metric math to combine queue depth and in-flight task count into a meaningful backlog-per-task metric, preventing erratic scaling while still responding to real demand.

  3. Abrupt descaling and lost messages: For long-running tasks, you should implement instance scale-in protection. This prevents ECS from terminating tasks that are actively processing messages. Your application should enable scale-in protection when it starts processing a message and disable it when processing completes. This ensures tasks aren't terminated while handling in-flight messages.

Implementation Approach:

For predictable workloads, step scaling tied to your custom backlog metric works well (for example, add one task at a certain threshold, two tasks at a higher threshold). However, given your variable processing times (5-23 seconds), target tracking with a custom metric combining queue depth and in-flight task count would likely work better and prevent the scaling from reacting erratically.

You'll need to use the AWS CLI or SDK to publish your custom metrics to CloudWatch, as this isn't available through the console alone. Once your custom metric is publishing, you can create a target tracking scaling policy that maintains your desired backlog-per-task target value.

This approach should give you much more stable scaling behavior, reduce over-provisioning costs, and prevent message loss during scale-in events.
Sources
Automatically scale your Amazon ECS service - Amazon Elastic Container Service
Configure scaling based on Amazon SQS - Amazon EC2 Auto Scaling
Scaling policy based on Amazon SQS - Amazon EC2 Auto Scaling
Configuring service auto scaling - Amazon Elastic Container Service
re:Invent 2025 - Building event-driven architectures using Amazon ECS with AWS Fargate | AWS re:Post

answered 21 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.