- Newest
- Most votes
- Most comments
Hello Andrew, nice to meet you! I'd be happy to help you optimize your ECS autoscaling configuration for your SQS-based workload.
Based on your scenario, the key issue is that standard SQS metrics like visible messages don't account for the actual processing capacity of your containers. For your application with variable processing times (5-23 seconds) and specific concurrency limits, you need a more sophisticated approach.
The Right Metric: Backlog Per Instance
Instead of using raw SQS metrics, you should create a custom CloudWatch metric called "backlog per instance" (or backlog per task). This is calculated by dividing the number of messages in your queue by the number of running ECS tasks. This metric gives you a much more accurate picture of whether you need to scale.
To determine your target value for this metric, use this formula:
- Acceptable backlog per task = (Acceptable latency) / (Average processing time per message)
For example, if you can tolerate 60 seconds of latency and your average processing time is 5 seconds, your target would be 12 messages per task. However, given your concurrency settings (200 concurrent messages total), you'll want to factor in your MAX_CONCURRENT_PROCESS value when setting this target.
Addressing Your Specific Issues:
-
Choosing the right metric: Request concurrency or backlog per task is more appropriate than simple queue depth for applications with variable processing times. You should have your application publish custom metrics to CloudWatch showing the actual concurrent requests being processed. This can be reported at least once per minute, and you can use the average concurrency across all replicas as your scaling metric.
-
Over-provisioning: This happens because standard queue metrics don't reflect your actual processing capacity. By switching to a backlog-per-task metric with target tracking, the autoscaler will more accurately match capacity to demand. For workloads where processing time varies widely, you can use CloudWatch metric math to combine queue depth and in-flight task count into a meaningful backlog-per-task metric, preventing erratic scaling while still responding to real demand.
-
Abrupt descaling and lost messages: For long-running tasks, you should implement instance scale-in protection. This prevents ECS from terminating tasks that are actively processing messages. Your application should enable scale-in protection when it starts processing a message and disable it when processing completes. This ensures tasks aren't terminated while handling in-flight messages.
Implementation Approach:
For predictable workloads, step scaling tied to your custom backlog metric works well (for example, add one task at a certain threshold, two tasks at a higher threshold). However, given your variable processing times (5-23 seconds), target tracking with a custom metric combining queue depth and in-flight task count would likely work better and prevent the scaling from reacting erratically.
You'll need to use the AWS CLI or SDK to publish your custom metrics to CloudWatch, as this isn't available through the console alone. Once your custom metric is publishing, you can create a target tracking scaling policy that maintains your desired backlog-per-task target value.
This approach should give you much more stable scaling behavior, reduce over-provisioning costs, and prevent message loss during scale-in events.
Sources
Automatically scale your Amazon ECS service - Amazon Elastic Container Service
Configure scaling based on Amazon SQS - Amazon EC2 Auto Scaling
Scaling policy based on Amazon SQS - Amazon EC2 Auto Scaling
Configuring service auto scaling - Amazon Elastic Container Service
re:Invent 2025 - Building event-driven architectures using Amazon ECS with AWS Fargate | AWS re:Post
Relevant content
- asked 3 years ago
