How do I troubleshoot an OIDC provider and IRSA in Amazon EKS?

8 minute read
0

My pods can't use the AWS Identity and Access Management (IAM) role permissions with the Amazon Elastic Kubernetes Service (Amazon EKS) account token.

Short description

To troubleshoot issues with the OpenID Connect (OIDC) provider and IAM roles for service accounts (IRSA) in Amazon EKS, complete the steps in one of the following sections:

  • Check if you have an existing IAM OIDC provider for your cluster
  • Check if your IAM role has an attached needful IAM policy with required permissions
  • Verify that the IAM role trust relations are correctly set
  • Check if you created a service account
  • Verify that the service account has the correct IAM role annotations
  • Verify that you correctly specified the serviceAccountName in your pod
  • Check the environment variables and permissions
  • Verify that the application uses a supported AWS SDK
  • Check the pod user and group
  • Recreate pods
  • Verify that the audience is correct
  • Verify that you configured the correct thumbprint
  • For the AWS China Region, check the AWS_DEFAULT_REGION environment variable

Resolution

Check if you have an existing IAM OIDC provider for your cluster

If a provider already exists, then you receive an error that's similar to the following message:

"WebIdentityErr: failed to retrieve credentials\ncaused by: InvalidIdentityToken: No OpenIDConnect provider found in your account for https://oidc.eks.eu-west-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E\n\tstatus code: 400"

1.    Check your cluster's OIDC provider URL:

$ aws eks describe-cluster --name cluster_name --query "cluster.identity.oidc.issuer" --output text

See the following example output:

https://oidc.eks.us-west-2.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E

2.    List the IAM OIDC providers in your account. Replace EXAMPLED539D4633E53DE1B716D3041E (include < >) with the value returned from the previous command:

aws iam list-open-id-connect-providers | grep EXAMPLED539D4633E53DE1B716D3041E

See the following example output:

"Arn": "arn:aws:iam::111122223333:oidc-provider/oidc.eks.us-west-2.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E"

If the preceding command returns an output, then you already have a provider for your cluster. If the command doesn't return an output, then you must create an IAM OIDC provider.

Check if your IAM role has an attached needful IAM policy with required permissions

1.    Open the IAM console.

2.    In the navigation pane, choose Roles.

3.    Choose the role that you want to verify.

4.    Under the Permissions tab, verify if this role has the required policy attached.

Verify that the IAM role trust relations are correctly set

With the AWS Management Console:

1.    Open the IAM console.

2.    In the navigation pane, choose Roles.

3.    Choose the role that you want to check.

4.    Choose the Trust Relationships tab to verify that the format of your policy matches the format of the following JSON policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/oidc.eks.AWS_REGION.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.AWS_REGION.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:sub": "system:serviceaccount:SERVICE_ACCOUNT_NAMESPACE:SERVICE_ACCOUNT_NAME",
          "oidc.eks.AWS_REGION.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:aud": "sts.amazonaws.com"
        }
      }
    }
  ]
}

To verify trust relations, run following command with your role name in the AWS Command Line Interface (AWS CLI):

$ aws iam get-role --role-name EKS-IRSA

Note: Replace EKS-IRSA with your IAM role name.

In the output JSON, look for the AssumeRolePolicyDocument section.

See the following example output:

{
  "Role": {
    "Path": "/",
    "RoleName": "EKS-IRSA",
    "RoleId": "AROAQ55NEXAMPLELOEISVX",
    "Arn": "arn:aws:iam::ACCOUNT_ID:role/EKS-IRSA",
    "CreateDate": "2021-04-22T06:39:21+00:00",
    "AssumeRolePolicyDocument": {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/oidc.eks.AWS_REGION.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E"
          },
          "Action": "sts:AssumeRoleWithWebIdentity",
          "Condition": {
            "StringEquals": {
              "oidc.eks.AWS_REGION.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:aud": "sts.amazonaws.com",
              "oidc.eks.AWS_REGION.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:sub": "system:serviceaccount:SERVICE_ACCOUNT_NAMESPACE:SERVICE_ACCOUNT_NAME"
            }
          }
        }
      ]
    },
    "MaxSessionDuration": 3600,
    "RoleLastUsed": {
      "LastUsedDate": "2021-04-22T07:01:15+00:00",
      "Region": "AWS_REGION"
    }
  }
}

Note: Check that you have specified the correct AWS Region, Kubernetes service account name, and Kubernetes namespace.

Check if you created a service account

Use the following command:

$ kubectl get sa -n YOUR_NAMESPACE

Note: Replace YOUR_NAMESPACE with your Kubernetes namespace.

See the following example output:

NAME      SECRETS   AGE
default   1         28d
irsa      1         66m

If you don't have a service account, see Configure service accounts for pods (from the Kubernetes website).

Verify that the service account has the correct IAM role annotations

Use the following command:

$ kubectl describe sa irsa -n YOUR_NAMESPACE

Note: Replace irsa with your Kubernetes service account name. Replace YOUR_NAMESPACE with your Kubernetes namespace.

See the following example output:

Name:                irsa
Namespace:           default
Labels:              none
Annotations:         eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT_ID:role/IAM_ROLE_NAME
Image pull secrets:  none
Mountable secrets:   irsa-token-v5rtc
Tokens:              irsa-token-v5rtc
Events:              none

Verify that you correctly specified the serviceAccountName in your pod

Use the following command:

$ kubectl get pod POD_NAME  -o yaml -n YOUR_NAMESPACE| grep -i serviceAccountName:

Note: Replace POD_NAME and YOUR_NAMESPACE with your Kubernetes pod and namespace.

See the following example output:

serviceAccountName: irsa

Check the environment variables and permissions

Look for AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE in the pod's environment variables:

$ kubectl -n YOUR_NAMESPACE exec -it POD_NAME -- env | grep AWS

See the following example output:

AWS_REGION=ap-southeast-2
AWS_ROLE_ARN=arn:aws:iam::111122223333:role/EKS-IRSA
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
AWS_DEFAULT_REGION=ap-southeast-2

Verify that the application uses a supported AWS SDK

The SDK version must be greater than or equal to the following values:

Java (Version 2) — 2.10.11
Java — 1.11.704
Go — 1.23.13
Python (Boto3) — 1.9.220
Python (botocore) — 1.12.200
AWS CLI — 1.16.232
Node — 3.15.0
Ruby — 2.11.345
C++ — 1.7.174
.NET — 3.3.659.1
PHP — 3.110.7

To check for latest supported SDK version, see Using a supported AWS SDK.

Check the pod user and group

Use the following command:

$ kubectl exec -it POD_NAME -- id
uid=0(root) gid=0(root) groups=0(root)

Note: By default, only containers that run as root have the proper file system permissions to read the web identity token file.

If your containers aren't running as root, then you can receive the following errors:

Error: PermissionError: [Errno 13] Permission denied: '/var/run/secrets/eks.amazonaws.com/serviceaccount/token

-or-

WebIdentityErr: failed fetching WebIdentity token: \ncaused by: WebIdentityErr: unable to read file at /var/run/secrets/eks.amazonaws.com/serviceaccount/token\ncaused by: open /var/run/secrets/eks.amazonaws.com/serviceaccount/token: permission denied

To provide the proper file system permissions, make sure that your containers run as root. For clusters 1.18 or lower, provide the following security context for the containers in your manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    metadata:
      labels:
        app: my-app
    spec:
      serviceAccountName: my-app
      containers:
      - name: my-app
        image: my-app:latest
      securityContext:
        fsGroup: 1337
...

Note: The fsGroup ID is arbitrary. You can choose any valid group ID. The preceding security context setting is not required for clusters 1.19 or later.

Recreate pods

If you created pods before you applied IRSA, then recreate the pods.

See the following example command:

$ kubectl rollout restart deploy nginx

See the following example output:

deployment.apps/nginx restarted

For daemonsets or statefulsets deployments, you can use the following command:

$ kubectl rollout restart deploy DEPLOYMENT_NAME

If you have created only one pod, then you must delete the pod and recreate it.

See the following example command to delete the pod:

$ kubectl delete pod POD_NAME

See the following example command to recreate the pod:

$ kubectl apply -f SPEC_FILE

Note: Replace SPEC_FILE with your Kubernetes manifest file path and file name.

Verify that the audience is correct

If you created the OIDC provider with the incorrect audience, then you receive the following error:

Error - An error occurred (InvalidIdentityToken) when calling the AssumeRoleWithWebIdentity operation: Incorrect token audience

Check the IAM identity provider for your cluster. Your ClientIDList is sts.amazonaws.com:

$ aws iam get-open-id-connect-provider --open-id-connect-provider-arn arn:aws:iam::ACCOUNT_ID:oidc-provider/oidc.eks.AWS_REGION.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E

See the following example output:

{
  "Url": "oidc.eks.AWS_REGION.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E",
  "ClientIDList": [
    "sts.amazonaws.com"
  ],
  "ThumbprintList": [
    "9e99a48a9960b14926bb7f3b02e22da2b0ab7280"
  ],
  "CreateDate": "2021-01-21T04:29:09.788000+00:00",
  "Tags": []
}

Verify that you configured the correct thumbprint

If the thumbprint that's configured in the IAM OIDC is not correct, you can receive the following error:

failed to retrieve credentials caused by: InvalidIdentityToken: OpenIDConnect provider's HTTPS certificate doesn't match configured thumbprint

To automatically configure the correct thumbprint, use eksctl or the AWS Management Console to create the IAM identity provider. For other ways to obtain a thumbprint, see Obtaining the thumbprint for an OpenID Connect identity provider.

For the AWS China Region, check the AWS_DEFAULT_REGION environment variable

If you use IRSA for a pod or daemonset that's deployed to a cluster in the AWS China Region, then set the AWS_DEFAULT_REGION environment variable in the pod specification. If you don't, the pod or daemonset can receive the following error:

An error occurred (InvalidClientTokenId) when calling the GetCallerIdentity operation: The security token included in the request is invalid

Use the following example to add the AWS_DEFAULT_REGION environment variable to your pod or daemonset specification:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    metadata:
      labels:
        app: my-app
    spec:
      serviceAccountName: my-app
      containers:
      - name: my-app
        image: my-app:latest
        env:
        - name: AWS_DEFAULT_REGION
          value: "AWS_REGION"
...

AWS OFFICIAL
AWS OFFICIALUpdated a year ago