Skip to content

EKS Auto Mode — NodeClass and NodePool revert to system defaults. How can I migrate to new subnets?

0

Background

I am migrating two EKS Auto Mode clusters (Kuberntes 1.35) from old /25 private subnets to new /23 private subnets due to IP exhaustion. I have already:

  • Created the new /23 subnets and associated them with the private route table
  • Updated both clusters via aws eks update-cluster-config to reference only the new subnet IDs
  • Verified the change with aws eks describe-cluster --query cluster.resourcesVpcConfig.subnetIds — the API correctly reflects only the new subnet IDs

The EKS control plane is correctly configured. The problem is that newly provisioned nodes continue to be placed in the old /25 subnets because the default NodeClass keeps reverting to the old subnet IDs.


What I have tried

1. Manual kubectl patch on the default NodeClass

Patched nodeclass.eks.amazonaws.com/default to replace subnetSelectorTerms with the new subnet IDs. The patch is acknowledged but EKS Auto Mode overrides it within minutes, restoring the old subnet IDs.

2. Deleting the NodeClass and waiting for recreation

Deleted the default NodeClass expecting Auto Mode to recreate it from the current cluster VPC config. It was recreated within seconds — but with the old subnet IDs, not the ones in resourcesVpcConfig.subnetIds. NodeClass recreation does not sync from the cluster VPC config.

3. Creating a custom NodeClass

Created a separate NodeClass resource with the new subnet IDs, managed by Helm. This custom NodeClass is not reverted by EKS. However, the built-in NodePools (general-purpose and system) always reference nodeClassRef: default, and EKS reverts any change to their nodeClassRef.

4. Patching NodePool nodeClassRef

Patched both NodePools to reference the custom NodeClass. EKS Auto Mode reverts this within minutes, regardless of managed-by labels.

5. Adopting NodePools into Helm management

Added Helm ownership annotations (meta.helm.sh/release-name, meta.helm.sh/release-namespace, app.kubernetes.io/managed-by: Helm) to both NodePools and declared them in a Helm chart so every deploy re-enforces the desired state. This held for several hours, but overnight EKS deleted the custom NodeClass entirely, recreated the default NodeClass with old subnets, and reverted both NodePools to nodeClassRef: default with managed-by: eks.

6. Replacing the default NodeClass with a Helm-managed copy

Adopted nodeclass.eks.amazonaws.com/default into Helm and updated its subnetSelectorTerms to the new subnets. EKS immediately re-patches the spec back to the old subnet IDs despite Helm ownership labels being present.


Observed behaviour

  • aws eks describe-cluster --query cluster.resourcesVpcConfig.subnetIds correctly shows only the new subnet IDs
  • kubectl get nodeclass default -o yaml always shows old subnet IDs in subnetSelectorTerms
  • EKS Auto Mode ignores app.kubernetes.io/managed-by: Helm on the default NodeClass and continues patching its spec
  • EKS Auto Mode reverts NodePool nodeClassRef back to default overnight even when the NodePools carry Helm ownership annotations
  • Custom NodeClasses created independently of Auto Mode are deleted by EKS overnight

Questions

  1. What is the correct procedure to migrate an EKS Auto Mode cluster to new private subnets, given that the default NodeClass does not sync from resourcesVpcConfig.subnetIds?
  2. Is there a supported API or configuration to explicitly set which subnets Auto Mode uses for node provisioning?
  3. Is deleting the old subnets from the VPC entirely the only reliable way to force Auto Mode to stop placing nodes in them?
  4. Why does EKS Auto Mode override resources that carry app.kubernetes.io/managed-by: Helm?
asked a month ago123 views
5 Answers
4

The behavior you are seeing is expected for EKS Auto Mode. The built-in resources (the default NodeClass and the general-purpose/system NodePools) are "managed" resources. The EKS Control Plane acts as the source of truth and its reconciliation loop will overwrite any manual changes, including those marked with Helm ownership labels.

The reason the default NodeClass doesn't sync with your resourcesVpcConfig.subnetIds change is that EKS Auto Mode treats these built-in resources as a static "bootstrap" configuration.

Why your current attempts are failing:

  • Helm/Kubectl Patches: EKS Auto Mode uses a high-priority reconciliation loop that ignores app.kubernetes.io/managed-by: Helm.
  • Finalizer Deletion: Do not manually remove finalizers from the default NodeClass. This can lead to orphaned EC2 resources and inconsistent state in the AWS backend.

I think the following would be a Migration approach:

Instead of fighting the "managed" default resources, you should transition your workloads to custom resources.

1. Create a Custom EC2NodeClass

Define a new NodeClass that explicitly points to your new /23 subnets.

apiVersion: eks.amazonaws.com/v1
kind: EC2NodeClass
metadata:
  name: private-v2
spec:
  subnetSelectorTerms:
    - id: subnet-0123456789abcdef0 # New Subnet A
    - id: subnet-0123456789abcdef1 # New Subnet B
  role: "AmazonEKSAutoNodeRole" # Ensure this matches your cluster's node role

2. Create Custom NodePools

Create new NodePools that reference your new EC2NodeClass.

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: general-purpose-v2
spec:
  template:
    spec:
      nodeClassRef:
        group: eks.amazonaws.com
        kind: EC2NodeClass
        name: private-v2
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
        # Add architecture or instance type requirements as needed

3. Migration (Drain the old nodes)

Once the new NodePools are active, you can migrate your workloads:

  • Cordon and drain the nodes in the old subnets: kubectl drain <node-name>.
  • The EKS Auto Mode controller will see the pending pods and provision new nodes using your custom NodePool (and thus the new subnets).

4. "Disabling" the Defaults

You cannot easily delete the default NodeClass without EKS recreating it. However, you can effectively "disable" the built-in NodePools by ensuring they have no capacity or by applying a Taint that your workloads do not tolerate.

Answers to your specific questions:

1. Correct Procedure: Don't migrate the default NodeClass. Create a side-by-side custom NodeClass/NodePool set and migrate workloads there.

2. Supported API: The EC2NodeClass and NodePool CRDs (Karpenter-based) are the official API for this.

3. Deleting old subnets: This is a "brute force" method. It would force a failure in the default provisioner, but it's cleaner to simply move your workloads to a NodePool that explicitly knows about the new subnets.

4. Helm Overrides: EKS Auto Mode is a "managed service" within the cluster. Its controller logic has higher precedence than Helm's client-side management.

Note: Always ensure your AmazonEKSAutoNodeRole has the necessary permissions to describe and join nodes within the new subnets.

EXPERT
answered a month ago
1

The reason your kubectl patches aren't sticking is that the EKS Control Plane acts as the ultimate source of truth. As mentioned in Step 2 of the Recommended Migration Approach, you cannot "disable" these via Kubernetes commands alone.

To stop the system NodePool from reappearing and using the old subnets, you must explicitly tell the AWS API to stop managing them:

Final Step

Run this AWS CLI command (not kubectl):

aws eks update-cluster-config \
--name <cluster-name> \
--compute-config '{
    "nodeRoleArn": "arn:aws:iam::<account-id>:role/AmazonEKSAutoNodeRole",
    "nodePools": [],
    "enabled": true
}'

After that, your system-v2 will handle everything, and you'll finally be able to delete that old subnet!

  • By passing "nodePools": [], you instruct EKS to stop reconciling the system and general-purpose pools.
  • Once this update is ACTIVE, the EKS reconciliation loop will "let go."
  • You can then safely kubectl delete the old NodePools and the default NodeClass one last time. They will not be recreated.

PS: If my answer was helpful, I would appreciate it if you could mark it as the accepted answer.

EXPERT
answered a month ago
0

Thank you for the guidance, Florian. Here is a summary of what we observed when implementing your recommendations.

Following your suggestion, we ran aws eks update-cluster-config with nodePools: [] to stop EKS from reconciling the built-in NodePools. However, the API rejected an empty list:

InvalidParameterException: Invalid Compute Config Node Pool values(s). Node pool values are case-sensitive and must be general-purpose and/or system.

An empty nodePools array is not a valid value — the API only accepts "general-purpose" and/or "system". We suspect this may be a version difference in the EKS Auto Mode API. We found that passing a list with only "general-purpose" (omitting "system") was accepted:

aws eks update-cluster-config \
  --name <cluster-name> \
  --region <region> \
  --compute-config '{"enabled":true,"nodePools":["general-purpose"],"nodeRoleArn":"arn:aws:iam::<account-id>:role/AmazonEKSAutoNodeRole"}' \
  --kubernetes-network-config 'elasticLoadBalancing={enabled=true}' \
  --storage-config 'blockStorage={enabled=true}'

After this update, EKS immediately deleted the system NodePool. Our custom system-v2 NodePool (backed by a custom NodeClass pointing at the new subnets) then took over scheduling for system workloads.

As the old subnets became empty, we finally managed to delete them all.

But when we later attempted to revert to EKS-managed NodePools (re-adding "system" to nodePools), we discovered that the EKS-managed default NodeClass still contained the old, deleted subnet IDs. EKS did not automatically update it when the cluster VPC config was changed to subnets. This meant that even after re-enabling both built-in NodePools, they remained in READY: False because they could not provision nodes into subnets that no longer existed. The fix was a manual kubectl patch on the default NodeClass with the correct subnet IDs:

kubectl patch nodeclass default --type=merge -p '{
  "spec": {
    "subnetSelectorTerms": [
      {"id": "subnet-<az-a>"},
      {"id": "subnet-<az-b>"},
      {"id": "subnet-<az-c>"}
    ]
  }
}'

EKS accepted this patch and did not revert it. After patching, both EKS-managed NodePools became READY: True.

The original goal of moving the cluster to the new subnets has been achieved, albeit with high effort. We have two remaining questions:

  1. Is the default NodeClass expected to be automatically reconciled when aws eks update-cluster-config --resources-vpc-config is used to change subnet IDs? In our case it was not, and this caused a significant outage when we removed our custom NodePools before the built-in ones were ready.

  2. Is there a recommended procedure for migrating the default NodeClass to new subnets that avoids a gap between removing custom NodePools and the built-in ones becoming ready?

As a customer, I wish to have EKS Auto Mode accept new subnets. This case, where we started with /25 subnets and ran out of IP addresses, seems like a common issue one would face with a growing business.

answered a month ago
EXPERT
reviewed a month ago
  • If my answer was helpful, I would appreciate it if you could mark it as the accepted answer.

0

Based on the behavior you're experiencing, EKS Auto Mode maintains strict control over its built-in NodePools and the default NodeClass, which explains why your manual modifications keep reverting. Here's what's happening and how to address it:

Understanding the Issue

The default NodeClass in EKS Auto Mode does not automatically sync with changes made to resourcesVpcConfig.subnetIds through the EKS API. EKS Auto Mode manages the built-in NodePools (general-purpose and system) and their associated default NodeClass independently, and it actively reconciles these resources back to its managed state, overriding manual changes regardless of management labels like app.kubernetes.io/managed-by: Helm.

Recommended Migration Approach

  1. Disable the built-in NodePools: Use the AWS CLI to disable EKS Auto Mode's built-in node pools:
aws eks update-cluster-config \
--name <cluster-name> \
--compute-config '{
"nodeRoleArn": "arn:aws:iam::<account-id>:role/AmazonEKSAutoNodeRole",
"nodePools": [],
"enabled": true
}'
  1. Delete the default NodeClass: After disabling the built-in node pools, delete the default NodeClass:
kubectl delete nodeclass default

If it's stuck in a Terminating state, force delete it:

kubectl patch nodeclass default -p '{"metadata":{"finalizers":null}}' --type=merge
  1. Create custom NodePools and NodeClass: Create your own NodePool and NodeClass resources with the new subnet configuration. When creating a custom NodeClass, specify the new subnet IDs in the subnetSelectorTerms. Then create custom NodePools that reference your custom NodeClass using the nodeClassRef field.

  2. Re-enable built-in NodePools (if needed): Once your custom resources are working correctly with the new subnets, you can optionally re-enable the built-in node pools. When re-enabled, they should pick up the current cluster configuration, though you may need to verify this behavior.

Why Management Labels Don't Work

EKS Auto Mode's reconciliation controller operates at a level that supersedes Kubernetes resource management labels. The service is designed to maintain the integrity of its managed resources, so it will override changes to built-in NodePools and the default NodeClass regardless of ownership annotations.

Alternative Consideration

If removing the old subnets from the VPC entirely is operationally feasible, this would force EKS Auto Mode to use only the available (new) subnets. However, the cleaner approach is to work with custom NodePools and NodeClasses rather than trying to modify the built-in resources.

The key insight is that EKS Auto Mode's built-in resources are meant to be managed by the service itself. For custom configurations like subnet migration, you should use custom NodePools and NodeClasses through the Kubernetes API rather than attempting to modify the built-in ones.
Sources
How do I troubleshoot custom nodepool and nodeclass provisioning issues in Amazon EKS Auto Mode? | AWS re:Post
Create an EKS Auto Mode Cluster with the eksctl CLI - Amazon EKS
How do I troubleshoot EKS Auto Mode built-in node pools with Unknown Status | AWS re:Post

answered a month ago
EXPERT
reviewed a month ago
0

Thank you for your answer, Florian. We followed all four steps of your recommended approach:

  1. Created private-v2 NodeClass — pointing to the new private subnets across three availability zones, deployed via the Helm infrastructure chart.

  2. Created general-purpose-v2 and system-v2 NodePools — both referencing private-v2, also deployed via Helm. These are confirmed untouched by EKS reconciliation (they remain managed-by: Helm).

  3. Drained all old nodes — workloads are now running on v2 NodePool nodes in the new subnets. All general-purpose workloads are confirmed in IPs.

  4. "Disabling" the defaults — this is where we are stuck...

The system NodePool (EKS-managed, nodeClassRef: default) keeps provisioning a single node into the old subnet in availability zone A, which prevents us from deleting that subnet.

The default NodeClass still has subnetSelectorTerms pointing to the old subnet IDs, despite the EKS cluster's resourcesVpcConfig.subnetIds containing only the new subnets. EKS Auto Mode appears to have bootstrapped the default NodeClass once at cluster creation and never synced it again — even after we updated the cluster VPC config and deleted the default NodeClass (forcing EKS to recreate it).

Attempts to disable the system NodePool:

  • kubectl patch nodepool system --type='merge' -p='{"spec":{"limits":{"cpu":"0"}}}' — EKS reverts this within minutes. Currently shows limits.cpu: "0" but a new system-* NodeClaim appears immediately after each delete.
  • Applying a NoSchedule taint to the system NodePool — EKS reverts taints on managed NodePools the same way.
  • Deleting the system NodeClaim — Karpenter immediately creates a new one in the same subnet.
  • Deleting the default NodeClass — EKS recreates it within ~60 seconds with the same old subnet IDs, even though the cluster VPC config no longer includes those subnets.
  • Direct EC2 terminate-instances — blocked by an explicit deny in a resource-based policy attached to EKS-managed node instances.

Is there a supported way to prevent the EKS-managed system NodePool from provisioning new nodes when the default NodeClass references a subnet that no longer exists in the cluster's VPC config?

Specifically: the system NodePool schedules kube-system pods tolerating CriticalAddonsOnly:NoSchedule (e.g. metrics-server, efs-csi-controller). Since our custom system-v2 NodePool already has capacity for these workloads, we want the EKS system NodePool to remain dormant. However, no patch we apply (limits, taints) survives the EKS reconciliation loop.

Is there a supported operation (e.g. an EKS API call, a cluster config flag, or a Safe way to update the default NodeClass subnet list) that would prevent the system NodePool from provisioning into the old subnet?

Any further inputs are greatly appreciated, thank you!

answered a month ago
EXPERT
reviewed a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.