How and when to scale in/out really depends on your workload. Is your application primarily CPU bound? Auto scale on CPU utilization. Is it memory bound? Auto scale on memory utilization. Do you have a predictable time-based usage profile, such as busy times during work days and quiet times during evenings/weekends? Auto scale using scheduled scaling.
Based on my experience, concurrent client connections may not be the best metric to use for auto scaling. You probably care most about user experience and cost - you want to provide the best possible user experience (e.g. fastest performance) for the lowest cost. Concurrent connections is probably correlated to CPU, memory, and response time: the more connections, the higher the CPU and memory usage, and the lower the response time. Instead, measure and scale based on response time.
ALBs publish a
TargetResponseTime metric, but this isn't actually very useful. As noted in the ALB docs, this metric measures "the total time elapsed (in seconds, with millisecond precision) from the time the load balancer sent the request to a target until the target started to send the response headers. That may be what you want, but more likely you want the total processing time - the time elapsed between receiving a request and sending a complete response. The ALB cannot help with that, so instead you can publish a custom metric and scale based on that.
Hope this info helps.
Cloud watch Metric shows CPU utilisation as 600%asked 2 months ago
MaxInstanceLifetime value for ECS FargateAccepted Answerasked 9 months ago
Autoscaling Web Servers with FargateAccepted Answerasked 2 months ago
Resource Utilization for Fargate ECSAccepted AnswerMODERATORasked 2 years ago
ECS: Autoscaling scale down the wrong instanceasked 17 days ago
Deploy on ECS Fargate Container from on-premiseasked 4 months ago
ECS: Capacity Provider vs Autoscaling Groupasked a year ago
Fargate minimum task set to one with load balancerAccepted Answerasked 2 years ago
Autoscaling with pythonasked 9 months ago