- Newest
- Most votes
- Most comments
It's working again, the hint to the solution came from the pod identity agent documentation which is also using pod-identity-webhook.
`Check configuration of MutatingWebhookConfiguration named pod-identity-webhook
admissionReviewVersions of the webhook needs to be v1beta1 and doesn’t work with v1.`
That did the trick.
The issue you're experiencing with IRSA (IAM Roles for Service Accounts) not injecting environment variables into your pods is concerning. There are a few potential causes and steps we can take to troubleshoot this problem:
-
Check the App Mesh controller: If you're using AWS App Mesh with your EKS cluster, there's a known issue where the App Mesh controller can interfere with IRSA. When App Mesh is enabled, it may prevent the application container from assuming the IRSA role, causing it to fall back to the node role instead. In this case, the AWS_WEB_IDENTITY_TOKEN_FILE and AWS_ROLE_ARN environment variables might not be included in the application container.
-
Verify the OIDC provider: Ensure that the OIDC provider for your EKS cluster is still properly configured and associated with your IAM roles. The OIDC provider is crucial for IRSA to function correctly.
-
Inspect the aws-auth ConfigMap: Check the aws-auth ConfigMap in the kube-system namespace to make sure the IAM roles are correctly mapped to Kubernetes service accounts.
-
Review the pod-identity-mutating-webhook: As you mentioned, it seems the webhook might not be injecting the environment variables. Verify that the webhook is running correctly and check its logs for any errors.
-
Examine the service account annotations: Ensure that the service accounts associated with your pods have the correct annotations, particularly the "eks.amazonaws.com/role-arn" annotation.
-
Check for recent changes: Although you mentioned there was no EKS update involved, review any recent changes to your cluster configuration, networking, or IAM policies that might have affected IRSA functionality.
-
Investigate the "kubernetes.io/legacy-token-invalid-since" annotation: While removing this annotation from the external-dns secret didn't resolve the issue, it's worth investigating why this annotation is present on many of your secrets, including those in kube-system. This annotation is related to the transition from legacy service account tokens to the newer token request API, which could potentially impact IRSA functionality.
-
Verify the EKS cluster's IRSA configuration: Double-check that IRSA is still enabled for your cluster and that the necessary components are in place.
-
Restart the aws-node DaemonSet: Try deleting the aws-node pods in the kube-system namespace. This will force them to restart and potentially resolve any issues with the VPC CNI plugin, which is crucial for IRSA functionality.
If none of these steps resolve the issue, you may need to contact AWS support for further assistance, as there could be an underlying problem with the EKS cluster or the IRSA implementation.
Sources
App Mesh Kubernetes troubleshooting - AWS App Mesh
Configure Amazon VPC CNI plugin to use IRSA - Amazon EKS
Identity and Access Management - Amazon EKS
Thanks for the quick response. Lets me quickly go through the list:
- App Mesh is not used in our EKS
- Check, everything is looking good in IAM roles. Provider is present and configured with the correct provider URL from EKS cluster.
- About the aws-auth config map. We are only using the config map to map our IAM users to the EKS admin user. So there are no ROLE to service account mappings here and never have been. From my understanding those mappings are done per annotation in the service account itself. Every service account using IRSA has the annotation eks.amazonaws.com/role-arn with the correct IAM role in place. This was mentioned in point 5 by you.
- I did activate the control plane audit logs and inestigaed the api server logs, but can't see any issues here.
- See Nr. 3
- The only thing we did is updating the cluster from time to time, so we are with version 1.32 at the moment.
- I guess the annotation is present duo to a change since Kubernetes 1.29 which is annotation those secrets with this annotation But I could not find any migration guide. Is there anything that needs to be done here to make use of the token request API?
- Check that the node role has AmazonEKSWorkerNodePolicy attached, the oidc provider is associated with the cluster. Like said before, I really double checked everything from the IRSA debugging guide.
- Also did that already and can't see any issues from the pod logs here.
aws-node Installed /host/opt/cni/bin/aws-cni aws-node Installed /host/opt/cni/bin/egress-cni aws-node time="2025-03-18T12:52:56Z" level=info msg="Starting IPAM daemon... " aws-node time="2025-03-18T12:52:56Z" level=info msg="Checking for IPAM connectivity... " aws-node time="2025-03-18T12:52:58Z" level=info msg="Copying config file... " aws-node time="2025-03-18T12:52:58Z" level=info msg="Successfully copied CNI plugin binary and config file." aws-vpc-cni-init time="2025-03-18T12:52:49Z" level=info msg="Copying CNI plugin binaries ..." aws-eks-nodeagent {"level":"info","ts":"2025-03-18T12:53:08.453Z","caller":"metrics/metrics.go:23","msg":"Serving metrics on ","port":61680} aws-vpc-cni-init time="2025-03-18T12:52:49Z" level=info msg="Copied all CNI plugin binaries to /host/opt/cni/bin" aws-vpc-cni-init time="2025-03-18T12:52:49Z" level=info msg="Found primaryMAC <hide>" aws-vpc-cni-init time="2025-03-18T12:52:49Z" level=info msg="Found primaryIF ens5" aws-vpc-cni-init time="2025-03-18T12:52:49Z" level=info msg="Updated net/ipv4/conf/ens5/rp_filter to 2\n" aws-vpc-cni-init time="2025-03-18T12:52:49Z" level=info msg="Updated net/ipv4/tcp_early_demux to 1\n" aws-vpc-cni-init time="2025-03-18T12:52:49Z" level=info msg="CNI init container done" stream closed EOF for kube-system/aws-node-cd2fv (aws-vpc-cni-init)
Relevant content
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 5 months ago
