- Newest
- Most votes
- Most comments
Well now it is stuck in "Deleting" status after trying to delete the add-on while retaining the resources. I am also unable to get it to delete including the existing resources.
Use the AWS CLI to describe the add-on and check its status.
aws eks describe-addon --cluster-name your-cluster-name --addon-name vpc-cni
Sometimes, there might be pending updates or dependencies that need to be resolved before the add-on can fully update.
Run
kubectl get daemonset aws-node -n kube-system
Use kubectl to check for any events that might indicate why the update is stuck.
kubectl get events -n kube-system
You can try to manually re-trigger the update by reapplying the same version or trying a rollback and then applying the update again.
First, roll back to the previous version:
aws eks update-addon --cluster-name your-cluster-name --addon-name vpc-cni --resolve-conflicts=overwrite --addon-version previous-version
Then, try updating to the desired version again:
aws eks update-addon --cluster-name your-cluster-name --addon-name vpc-cni --resolve-conflicts=overwrite --addon-version 1.17.1
Hello,
the stuck VPC CNI add-on using the AWS CLI, then reinstall it with the correct version. After that, verify that all aws-node pods are running smoothly in the cluster.
please look at Document link you will get more information.
https://repost.aws/knowledge-center/eks-plan-upgrade-cluster
Relevant content
- AWS OFFICIALUpdated 5 days ago
- AWS OFFICIALUpdated 6 months ago

Hello,
you can resolve it by first trying to force delete the add-on using the AWS CLI command aws eks delete-addon --cluster-name <cluster_name> --addon-name <addon_name> --force. If that doesn't work, manually delete any Kubernetes resources related to the add-on using kubectl, and check for stuck finalizers on those resources. After cleaning up the resources, retry the deletion.
I have resolved the issue now by deleting the aws-node daemonset. Although this did create downtime on the cluster. It was a staging cluster so no issue there but this would not be a suitable option for a production cluster. How would this be resolved with zero downtime?