Q: Why can't ECS terminate EC2 instances with 0 running tasks? ECS starts new EC2 instances even though Auto Scaling Policies are not in alarm.
- One quick bit of background clarification - ECS itself shouldn't ever be directly launching new instances. It should be changing the CapacityProviderReservation metric, which will trigger the CloudWatch alarm, which triggers the ASG scaling policy. That's the only method ECS should have to indirectly cause the launch/termination of instances
- If instances are being launched some other way, then its possible they're not being registered with the capacity provider correctly, and its not calculating the cluster size correctly to be able to scale-in
- ASG usually only scales when the desired is changed through a scaling policy, but there could be other times is launches or terminates. I'd suggest going into the Activity History of the ASG to see the reason for the launch/terminate events. I'd guess in your case its from Spot instances being reclaimed by EC2, and the ASG replacing them. If you don't have Capacity Rebalance enabled on the ASG, then the instances will be terminated and not gracefully drained. With this disabled, the activity history message will show the instances were replaced due to failing EC2 healthchecks
- Are weights set on the ASG? If so, this isn't supported by ECS and will cause scaling issues, since the capacity provider is assuming they're not configured
Q: "Desired size" of ASG is sometimes bigger than "Current size". That also might happen even if Auto Scaling Policies are not in alarm
- ASG will always try to meet the desired, and will keep retrying if there's launch (or terminate) failures
- Check the Activity History of the ASG to see if there are launch failures. Since you're using Spot, my guess is you'll find launch failures saying there's no capacity. This means the ASG has asked EC2 for all your configured instance types, and none of them had capacity at the time. If you don't already, we recommend at least 10 different instance types when using spot. This is a very general recommendation, and you might need more for some regions, instance types, or if its a large workload. Use the Spot Placement Score (SPS) as a better rough guide to see if you have enough instance types in the ASG.
Those are some general answers based on your setup, but in the end its hard to tell exactly what's happening without seeing the resources themselves, so you might be best off opening a support case for a more exact answer.
- AWS 官方已更新 2 年前
- AWS 官方已更新 9 个月前
- AWS 官方已更新 2 年前