In the ever-evolving realm of Kubernetes, managing resources can pose unforeseen challenges. Lately, a perplexing issue has garnered global attention among Kubernetes user - the unexplained surge in node lease objects within the 'kube-node-lease' namespace.
If you've ever scratched your head over this, read on, because we've delved into the heart of the problem and found solutions that can benefit Kubernetes users worldwide.
Understanding the Landscape:
Following are the key components involved:
-
Leases Component: In Kubernetes, leases are used for critical functions like node heartbeats and leader election. They play a crucial role in keeping the cluster in sync and functioning smoothly.
-
Garbage Collector: Kubernetes employs a Garbage Collector to clean up cluster resources, ensuring efficient resource usage and maintaining cluster health.
-
Owner References: Dependent objects have an “ownerReferences” field that establishes relationships with their parent objects. This field is pivotal for garbage collection and object adoption.
Problem Statement:
Customers running EKS cluster notices a constant increase in node lease objects. What's peculiar is that these leaked leases lack an “ OwnerReference”. This behavior is not by design but rather a result of a Kubernetes bug.
Reproduce the issue:
- Create an EKS cluster with worker nodes
- Delete the worker node object (while keeping kubelet and node running), watch as the Garbage Collector removes the lease object.
But here's the catch, kubernetes backend logic will create a new lease object without an OwnerRef. This cycle results in a spike in lease objects visible in monitoring tools (CloudWatch/DataDog/NewRelic/others)
Mitigating the Problem:
Now that we've unveiled the mystery, let's explore solutions:
- Manual Cleanup: For those not using Karpenter, a manual cleanup approach involves removing leaked leases using the following command:
kubectl get lease -n kube-node-lease -o json | jq -r '.items | map(select(.metadata.ownerReferences == null ) | .metadata.name) | .[]' | xargs -n 10kubectl delete lease -n kube-node-lease
- Karpenter Integration: EKS with Karpenter now includes a controller to handle node lease garbage collection, providing a solution until the upstream bug is resolved.
References:
- https://github.com/kubernetes/kubernetes/issues/109777
- https://github.com/aws/karpenter/issues/4363
- Karpenter controller to fix node leakage issue: https://github.com/aws/karpenter-core/pull/471
Co-Author :
- Sidhartha Kotha
- Dharmendra Singh