Nodes launched by EKS Auto Mode have a maximum lifetime of 21 days. This article shows how to set a maintenance window to allow controlled node replacement
This guide provides configuration guidance for scheduling drifted node maintenance in EKS Auto Mode clusters, specifically allowing updates during a controlled maintenance window while restricting updates at other times.
EKS Auto Mode utilises the concept of Amazon EC2 managed instances and for security reasons Auto Mode enforces a maximum lifetime of 21 days for those instances, see Auto Mode Features.
EKS Auto Mode releases a new Auto Mode AMI frequently, sometimes once per week.
Under the hood Karpenter is used to provision compute capacity and manage the lifecycle of the corresponding nodes. When a new AMI is available , Karpenter nodeclaims are marked "AmiDrifted" (see nodeclaim status section).
Karpenter NodePool CRD is responsible to steer disruption using disruption budgets and schedules. It is possible to allow or prohibit disruption per so-called reason, with "Drifted" being one of them.
If undefined, Karpenter will default to one budget with nodes: 10%, i.e. allow 10% of nodes to be disrupted for any reason at any time, see Karpenter docs NodePool Disruption Budgets.
The following NodePool disruption specification allows replacement of 10% of "Drifted" nodes for 2 hours on every Monday, starting at 6 PM, completely prohibiting disruption for "Drifted" outside of this schedule, while allowing disruption of up to 20% of "Empty" nodes at all times.
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: your-nodepool-name # Replace with your desired nodepool name
spec:
disruption:
budgets:
# General protection: Allows empty node removal up to 20% at any time
- nodes: "20%" # up to 20% of empty nodes at any time
reasons:
- Empty
# Blocks drifted node updates on Sunday and Tuesday through Saturday
- schedule: "0 0 * * sun,tue-sat"
duration: 24h
nodes: "0"
reasons:
- Drifted
# Blocks drifted node updates on Monday from 00:00 to 18:00
- schedule: "0 0 * * mon"
duration: 18h
nodes: "0"
reasons:
- Drifted
# Maintenance window: Allows drifted node updates on Monday 18:00-20:00 (6-8 PM)
- schedule: "0 18 * * mon"
duration: 2h # make duration long enough to allow rotation
nodes: "10%" # Allows up to 10% of nodes to be updated during this window
reasons:
- "Drifted"
# Blocks drifted node updates on Monday from 20:00 to 24:00
- schedule: "0 20 * * mon"
duration: 4h
nodes: "0"
reasons:
- Drifted
Notes:
- If no budget is specified for a given period of time, disruption will be unrestricted
- sun is defined as day 0, so it has to come first in the schedule definition
schedule: "0 0 * * sun,tue-sat"
- Timezones are not currently supported. Schedules are always in UTC, see Schedule in Karpenter docs.
- Modify schedule, duration and nodes to meet your application and business needs.
Disruption is still subject to other levers like terminationGracePeriod and proper considerations need to be done to match those.
See Karpenter Provider AWS issue Document behaviour when terminationGracePeriod is longer then budget schedule duration #8526 for additional information.
Carefully test this configuration in a non-production environment first. Monitor the first few maintenance windows to ensure proper operation.