Unexpected EC2 backed ECS Container Restart Potential

0

We have an ECS container running on an EC2 backed instance. Assuming we don't do any cluster maintenance that impacts the autoscaling group, the number of containers using the group, the image type backing the group, the parameters of containers that use the group, etc...how likely is it that AWS will restart that container without warning? We realize that Fargate can yank and redeploy containers at any time, but EC2 instances created for the purpose of running and scaling containers should be stable unless user-initiated maintenance is performed, correct?

After reading this article, I'm pretty confident that my container will remain running until and unless we do maintenance on the cluster that would disturb the load on the cluster. https://repost.aws/knowledge-center/ecs-task-stopped. We have reason to be paranoid that our task will restart unexpectedly. The timing of restarts, if necessary is critical. Specifically we don't want it to restart within a particular time window. Is there any more of a chance that this will restart than say a typical EC2 VM running this process? We will continue to make use of EC2 instead of ECS with EC2 backing if the odds of the container restarting is greater based on AWS initiated maintenance, so any thoughts are appreciated.

kgminer
asked 5 months ago128 views
1 Answer
1

Hello There,

I understand that you have queries around availability of Task/Containers and EC2 when used with ECS and without.

================

  • [+] To begin with, regarding as to how often will AWS restart an ECS container, AWS won't make changes to your ECS containers generally. Most restarts of the tasks are due to ELB/EC2/Container health check failures. That too, if the ECS tasks are part of a Service, AWS ensures uptime by replacing these tasks accordingly. [1]
  • [+] Regarding Fargate, well yes, AWS can terminate the underlying host, but they don't do that often. When AWS does need to patch or work on their underlying host for some reason, they provide a notification in PHD beforehand regarding the maintenance. So in that case, you'll have quite a duration when you can restart tasks or rollout a new deployment to ensure the tasks don't get terminated by AWS at an expected time. So to say the least, Fargate provides features pretty much same as EC2 launch type in terms of availability, but you'll have lesser visibility into host details and information. [2]
  • [+] Furthermore, utilising an EC2+ECS, falls more on User side than the AWS in terms of responsibility where you need to work on patching, updates etc. Unlike Fargate, where AWS maintains everything and you just need to ensure that the underlying host infrastructure is using latest release and version. To summarise this point, AWS only touches EC2 host for hardware issues, while in case of Fargate, they touch the underlying host for any reason from updates to hardware issues. [3]
  • [+] Moreover, I'll say possibility of restart of an EC2 instance is same, irrespective of whether you are running your containers on ECS+EC2 or on an EC2 standalone. Its just that with ECS+EC2 configuration, you'll have to ensure that ECS Agent is well configured and running. Any issues with that and your EC2 goes unhealthy for ECS usage, leading to a replacement. Few other dependencies might be there, you'll have to test the reliability accordingly to understand the best option for your use case. [4] [5]
  • [+] Additionally, if you are going for ECS+EC2, I'd recommend utilising Capacity Provider with an ASG for your EC2 instances. This might differ on case to case basis, but I'd prefer utilising the features like "Managed termination protection" and "Managed Draining" when considering infrastructure for a highly available application along with Multi AZ configurations and Spread strategies accordingly. You may go through the attachments to get better insights into the same. [6] [7]

================

  1. A deep dive into Amazon ECS task health and task replacement: https://aws.amazon.com/blogs/containers/a-deep-dive-into-amazon-ecs-task-health-and-task-replacement/
  2. AWS Fargate task retirement notifications: https://aws.amazon.com/blogs/containers/improving-operational-visibility-with-aws-fargate-task-retirement-notifications/
  3. Shared responsibility model in ECS: https://docs.aws.amazon.com/AmazonECS/latest/bestpracticesguide/security-shared.html
  4. EC2 Deprecation: https://repost.aws/knowledge-center/ec2-deprecation-deadline.
  5. AutoScaling Terminated EC2 Instance: https://repost.aws/knowledge-center/auto-scaling-instance-how-terminated
  6. Control the instances Amazon ECS terminates: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/managed-termination-protection.html
  7. Easier EC2 instance maintenance with managed draining for Amazon ECS capacity providers: https://community.aws/content/2bMN99heQOCAAkJDC7wwB6ObVyx/manage?lang=en
  8. Amazon ECS Availability Best Practices: https://aws.amazon.com/blogs/containers/amazon-ecs-availability-best-practices/
AWS
answered 14 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions