Should ECS Service Task start be triggered by ASG capacity 0->1?

0

TL;DR: If an ECS cluster's ASG scales from 0->1 running instances, should a Service with Task Desired=1 and Running=0 automatically react and start a Task, or does it have to be prodded?

I have an ECS EC2-backed cluster with an EC2 Auto Scale Group. The ASG can be scaled-to-zero, either manually or via some alarm. When the ASG scales to zero, ECS notices and stops the Task, so Desired=1, Running=0. All good.

If I later scale the ASG to Desired>0 (e.g. Desired=1) and instances start, these appear in ECS, however it does not seem to trigger the ECS Service to attempt to start a Task, now having capacity to do so, and Desired=1, Running=0.

Is this expected - does an ECS Service need to be prodded in some way so it notices that it now has capacity to deploy tasks, and if so, what is the best prod to do?

Edit: I'm now finding that ECS will start a task, but after some considerable delay after the ASG has reached desired capacity and new instances have started and the ECS agent registered with ECS - >10 minutes on all occasions. Is this some sort of tuneable polling, or instance-warm-up safety?

asked 2 years ago1094 views
1 Answer
1
Accepted Answer

You are observing ECS's retry behavior. If ECS cannot start a Task in a Service because there are insufficient resources, ECS will retry scheduling the Task until it is successful (or the desired count is reduced, negating the need to schedule a replacement task).

Per AWS best practices, ECS implements an exponential backoff algorithm. The interval between retries grows longer and longer after each scheduling failure. So, if a Task has failed to schedule for some time, you may experience a significant delay after adding new capacity before the Task is rescheduled. 10-15 minutes is the maximum retry interval.

If you are intentionally terminating EC2 instances in order to save cost, it is recommended that you also reduce the Desired Count of your ECS services on the cluster so that they all fit. If you use EC2 Auto Scaling Capacity Providers with ECS, ECS will manage the ASG's desired count for you. Alternatively, you can use Fargate, which is serverless; you only pay for Tasks that are actively running.

AWS
EXPERT
answered 2 years ago
profile picture
EXPERT
reviewed a month ago
  • Thanks Michael for this sound advice: 1) Don't leave an ECS Service with a desired count that can't be fulfilled by the ASG (will suffer backoff-retry and slow recovery), and 2) Prefer EC2 Auto Scaling Capacity Providers.

    I've just done some testing and the Auto Scaling Capacity Provider works nicely when configured correctly. I wasn't yet on this path because a) starting CloudFormation didn't use it and b) in my specific case it was easier to detect/scale-to-zero the ASG directly than the ECS Service, without custom metrics.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions