- Newest
- Most votes
- Most comments
How and when to scale in/out really depends on your workload. Is your application primarily CPU bound? Auto scale on CPU utilization. Is it memory bound? Auto scale on memory utilization. Do you have a predictable time-based usage profile, such as busy times during work days and quiet times during evenings/weekends? Auto scale using scheduled scaling.
Based on my experience, concurrent client connections may not be the best metric to use for auto scaling. You probably care most about user experience and cost - you want to provide the best possible user experience (e.g. fastest performance) for the lowest cost. Concurrent connections is probably correlated to CPU, memory, and response time: the more connections, the higher the CPU and memory usage, and the lower the response time. Instead, measure and scale based on response time.
ALBs publish a TargetResponseTime
metric, but this isn't actually very useful. As noted in the ALB docs, this metric measures "the total time elapsed (in seconds, with millisecond precision) from the time the load balancer sent the request to a target until the target started to send the response headers. That may be what you want, but more likely you want the total processing time - the time elapsed between receiving a request and sending a complete response. The ALB cannot help with that, so instead you can publish a custom metric and scale based on that.
For auto scaling on CPU and memory utilization, use target tracking scaling. For auto scaling on response time, use step scaling with the custom CloudWatch metric published by your application.
Hope this info helps.
Relevant content
- Accepted Answerasked 2 years ago
- asked 3 years ago
- asked a year ago
- AWS OFFICIALUpdated 5 months ago
- AWS OFFICIALUpdated 6 months ago
- AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated a year ago