- Newest
- Most votes
- Most comments
This issue you're experiencing with EKS Fargate nodes not being deleted after job and pod termination is indeed unusual and can lead to unexpected costs. Here are some insights and suggestions to help you address this problem:
-
Pod Lifecycle on Fargate: In normal circumstances, when a Kubernetes Job is marked as 'Completed' or 'Failed', the pods created by that job continue to exist. This is by design to allow you to view logs and results. However, on Fargate, this behavior can lead to ongoing costs if not managed properly.
-
TTL Controller: You've mentioned using the TTL controller (ttlSecondsAfterFinished set to 60 seconds), which is a good practice. This should automatically delete the related pods after a job completes or fails. However, if you're still seeing pods lingering, there might be an issue with the TTL controller or how it's interacting with Fargate.
-
Fargate Profile Configuration: Ensure that your Fargate profile is correctly configured to match the pods created by your CronJobs. If the pods don't match any Fargate profile after job completion, they might remain in a pending state and prevent node termination.
-
Manual Pod Deletion: As a temporary workaround, you could implement a script that periodically checks for completed jobs and manually deletes any associated pods that haven't been cleaned up.
-
Logging and Monitoring: Enable detailed logging for your cluster and pods. Check the Kubernetes events and logs for any error messages or warnings related to pod termination or Fargate node draining.
-
AWS Support: If the issue persists and you can't identify the root cause, it may be worth contacting AWS support. This could be a specific interaction between EKS, Fargate, and your workload that requires deeper investigation.
-
Pod Disruption Budgets: While not directly related to your issue, ensure you have appropriate Pod Disruption Budgets (PDBs) set up. This can help control the number of pods that are down simultaneously during updates or node recycling.
-
Fargate OS Patching: Be aware that Amazon EKS periodically patches the OS for Fargate nodes. During this process, nodes are recycled, which could potentially interfere with normal pod termination processes. Ensure you have proper handling for these events.
Remember, Fargate is designed to simplify container management, so persistent issues with node termination are not expected. Continue to monitor the situation closely, and if the problem persists after trying these suggestions, it may indicate a more systemic issue that requires AWS's attention.
Sources
Set actions for AWS Fargate OS patching events - Amazon EKS
Simplify compute management with AWS Fargate - Amazon EKS
Relevant content
- asked 7 months ago
- asked a year ago
- AWS OFFICIALUpdated 10 months ago
- AWS OFFICIALUpdated 2 years ago
