Skip to content

Pods are getting recreated due to the fargate node in EKS

0

Hello Team,

I am running my workload in EKS Cluster using Fargate node and it runs successfully but intermittently I see my fargate nodes are recreated from AWS so the Kubernetes statefulset and deployment running in fargate gets recreated which brings error in my application because my Kubernetes statefulset and deployment needs to be created in sequential manner but it's not happening in this case therefore I want to understand why the fargate nodes which my application is using gets updated intermittently and how can I avoid this issue in future.

As of now to resolve the above issue i need to delete everything from my EKS Cluster and recreate it again.

Should I use managed node group instead of fargate?

Also, I am pulling image from dockerhub in my deployment and statefulset definition so could it be the case that my docker image gets updated then my pod also gets updated.

Should I pull docker image from ECR instead of docker hub in my Kubernetes statefulset and deployment?

Fargate_node

kubernetes_resource

opensearch_image_update

2 Answers
0

Hi Atif,

I’m sorry for the annoyance that AWS Fargate has caused you with restarting its instances (or microVM’s) in a manner that breaks your application.

While I can’t give you a definite answer as to what the problem is (there might be multiple moving parts causing the problem), I hope that some of the information I provide will be of some use. My response is based on the following assumptions about your situation:

  • Your Fargate instance/microVM restarts don't happen very often, but they happen often enough (maybe once or twice a month) to be an annoyance
  • Your statefulset and deployment are meant to be running continuously for an extended period of time (at least several weeks at a time)
  • You've set a specific hard limit and resource request for your pods so the amount of allocated CPU and memory for each pod is consistent with every launch.

TLDR; AWS Fargate will cause occasional restarts on its own for maintenance reasons, so switch to a compute resource (like Amazon EC2) that you manage if you want to avoid such mandatory restarts. AWS Fargate usually gives you a heads up about upcoming mandatory updates. Double check that your pods were healthy leading up to the Fargate instance/microVM restart since an unhealthy pod may trigger a Fargate instance/microVM restart. Using DockerHub instead of Amazon ECR should be fine; make sure you pin your image version so there are no unexpected restarts when a new image version is released. Consider redesigning your application to be more resilient to interruptions since containers are meant to handle unexpected interruptions.

First, I am not sure if there is anything that can be done to decrease random restarts of AWS Fargate instances/microVM’s. Since AWS Fargate is an AWS-managed service the uptime and functionality of it is handled by AWS in a separate, AWS-managed VPC. If AWS determines that an update is necessary for Fargate instances then AWS will go ahead and force the update; there’s no way to opt-out of such an update and instance restart. AWS usually gives you a warning about upcoming mandatory maintenance updates for AWS Fargate (source). The only way to avoid such forced updates is to manage the instances yourself (possibly through a Managed Node Group) rather than have AWS manage them for you through Fargate.

I would also recommend taking a look at Amazon CloudWatch Logs or Amazon CloudTrail to see if there are any details or reasons given about why a Fargate instance/microVM is restarted. If you’re interested in how AWS Fargate works under the hood then you can take a look at this blog post.

Second, double check that your pods are continually in a healthy state leading up to the Fargate instance/microVM restart. If a pod in Fargate is unhealthy long enough then Kubernetes might try to recreate the pod to get it to a healthy state, consequently cause Fargate to create a new instance/microVM for the new pod.

Third, pulling an image from DockerHub rather than Amazon ECR should have no impact on the availability of a Fargate instance/microVM as long as the image is successfully pulled down to use in a pod. Ensure that you are pinning the version of the image so that the pod and Fargate microVM are not getting recreated whenever a newer version of the image is released.

Fourth, consider redesigning your container implementation to plan for interruptions. Containers are meant to be easily replicated for scaling and for recovering from unexpected interruptions. You could look into having specific pods check the health of required components to ensure that pods are eventually launched with their required dependencies. If a required component does not return a healthy status then the checking pod can either wait before checking again or signal an unhealthy status to the Kubernetes control plane so that it is recreated again (and hopefully at a time that its dependencies are now ready).

If your statefulset is hosting a database then the AWS best practice is to host your database outside of your Kubernetes cluster in order to decrease the operational overhead statefulsets cause.

I hope that is helpful! If you have follow-up questions or additional information you would like to share then please feel free to do so!

AWS
answered a year ago
-1

The issue you're experiencing with Fargate nodes being recreated intermittently is likely due to AWS applying important security patches and critical updates to the Fargate infrastructure. This is a normal part of AWS's maintenance process to ensure the security and stability of the Fargate platform.

To minimize disruption to your application, you can take several steps:

  1. Use Kubernetes Deployments or StatefulSets instead of standalone pods. These controllers manage pod lifecycle and can ensure proper sequencing of your application components.

  2. Implement Pod Disruption Budgets (PDBs) for your workloads. PDBs help maintain application availability during voluntary disruptions like node updates.

  3. Design your application to be more resilient to pod restarts. This could involve implementing proper startup and shutdown procedures, using persistent storage where necessary, and ensuring your application can handle temporary unavailability of some components.

  4. Consider using Amazon EKS managed node groups instead of Fargate if you need more control over the underlying infrastructure. Managed node groups allow you to have more predictable scheduling and potentially less frequent disruptions.

Regarding your Docker image, if you're pulling from DockerHub without specifying a fixed tag (like 'latest'), your pods might indeed be updated when the image is updated. To prevent this, use specific version tags for your Docker images.

Deleting and recreating your entire EKS cluster should not be necessary to resolve this issue. Instead, focus on improving your application's resilience to pod restarts and implementing the strategies mentioned above.

If you continue to experience issues, you may want to reach out to AWS support for more specific guidance tailored to your use case.
Sources
FAQs on Fargate Pod eviction notice | AWS re:Post
Announcing Node Health Monitoring and Auto-Repair for Amazon EKS - AWS

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.