How do I troubleshoot issues with my EBS volume mounts in Amazon EKS?

6 minute read
0

I'm receiving the following error in my pods when mounting Amazon Elastic Block Store (Amazon EBS) volumes in my Amazon Elastic Kubernetes Service (Amazon EKS) cluster: "Timeout expired waiting for volumes to attach or mount for pod"

Resolution

Before you begin the following troubleshooting steps, verify that you have the following:

  • The required AWS Identity and Access Management (IAM) permissions for your "ebs-csi-controller-sa" service account IAM role.
  • A valid PersistentVolumeClaim (PVC) is present in the same namespace as the pod.
  • A valid EBS storage class definition using the in-tree provisioner "kubernetes.io/aws-ebs" or the Amazon EBS Container Storage Interface (CSI) driver provisioner "ebs.csi.aws.com".

Verify that the Amazon EBS CSI driver controller and node pods are running

The Amazon EBS CSI driver consists of controller pods that run as a deployment and node pods that run as a daemonset. Run the following commands to verify if these pods are running in your cluster:

kubectl get all -l app.kubernetes.io/name=aws-ebs-csi-driver -n kube-system

Note: The Amazon EBS CSI driver isn't supported on Windows worker nodes or EKS Fargate.

Make sure that the installed Amazon EBS CSI driver version is compatible with your cluster's Kubernetes version.

Verify if the PVC encountered issues while binding to the EBS persistent volume

To verify if the PVC encountered issues, run the following command to view events. In the following example command, replace pvc-name and namespace with the correct values for your environment.

kubectl describe pvc <pvc-name> -n <namespace>

If you're using dynamic volume provisioning, review the returned events to determine if volume provisioning succeeded or failed. You can also see the corresponding persistent volume name that the PVC is bound to, as shown in the following example:

Name:          ebs-claim
Namespace:     default
StorageClass:  ebs-sc
Status:        Bound
Volume:        pvc-5cbd76de-6f15-41e4-9948-2bba2574e205
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com
               volume.kubernetes.io/selected-node: ip-10-0-2-57.ec2.internal
. . . . .
. . . . . 
Events:
  Type    Reason                 Age                    From                                                                                      Message
  ----    ------                 ----                   ----                                                                                      -------
. . . . .
  Normal  Provisioning           5m22s                  ebs.csi.aws.com_ebs-csi-controller-57d4cbb9cc-dr9cd_8f0373e8-4e58-4dd0-b83c-da6f9ad5d5ce  External provisioner is provisioning volume for claim "default/ebs-claim"
  Normal  ProvisioningSucceeded  5m18s                  ebs.csi.aws.com_ebs-csi-controller-57d4cbb9cc-dr9cd_8f0373e8-4e58-4dd0-b83c-da6f9ad5d5ce  Successfully provisioned volume pvc-5cbd76de-6f15-41e4-9948-2bba2574e205

If the provisioning failed, find the error message in events.

Review the Amazon EBS CSI controller pods' logs

Check the controller pod logs to understand the cause of the mount failures. If the volume is failing during creation, refer to the ebs-plugin and csi-provisioner logs. Run the following commands to retrieve the ebs-plugin container logs:

kubectl logs deployment/ebs-csi-controller -n kube-system -c ebs-plugin
kubectl logs daemonset/ebs-csi-node -n kube-system -c ebs-plugin

Run the following command to retrieve the csi-provisioner container logs:

kubectl logs deployment/ebs-csi-controller -n kube-system -c csi-provisioner

If the EBS volumes are failing to attach to the pod, review the csi-attacher logs to understand why. Run the following command to retrieve the csi-attacher container logs:

kubectl logs deployment/ebs-csi-controller -n kube-system -c csi-attacher

Verify that the Amazon EBS CSI driver controller service account is annotated with the correct IAM role and that the IAM role has the required permissions

Unauthorized errors in your PVC events or in your ebs-csi-controller logs are caused by the following:

1.    Run the following command to determine if the service account used by ebs-csi-controller pods has the correct annotation:

kubectl describe sa ebs-csi-controller-sa -n kube-system

Verify that the following annotation is present:

eks.amazonaws.com/role-arn = arn:aws:iam::111122223333:role/AmazonEKS_EBS_CSI_DriverRole

2.    Verify that the IAM OIDC provider for the cluster is created, and that the IAM role has the required permissions to perform EBS API calls. Also, verify that the IAM role's trust policy trusts the service account ebs-csi-controller-sa.

3.    Review your account's AWS CloudTrail logs to verify that the CreateVolume, AttachVolume, and DetachVolume calls are being made. Also review the CloudTrail logs to determine which principal made the calls. This information helps you determine if the service account IAM role is being used by the controller or the worker node IAM role.

Verify the persistent volume's node affinity

Each persistent volume is created with a node affinity limiting attachment of persistent volumes to nodes within a single Availability Zone. This is because EBS volumes can only be attached to pods or nodes running on the same Availability Zone that they were created in. If pods that are scheduled on to nodes in one Availability Zone try to use the EBS persistent volume in a different Availability Zone, you receive an error similar to the following:

FailedScheduling: 1 node(s) had volume node affinity conflict

To avoid this, use StatefulSets instead of Deployment, so that a unique EBS volume is created for each pod of the StatefulSets in the same Availability Zone as the pod.

You can verify the persistent volume's node affinity by running the following command. In the following example command, replace persistent-volume-name with your volume's name.

kubectl describe pv <persistent-volume-name>

Note: Keep in mind that you can't mount an EBS volume to two different pods running on two different worker nodes. The EBS volume can be attached to pods running on one node but can't be attached to another node at the same time. If you try to attach your EBS volume to two different pods on different worker nodes, the pod fails and you receive an error similar to the following:

Warning FailedAttachVolume 2m38s attachdetach-controller Multi-Attach error for volume "pvc-1cccsdfdc8-fsdf6-43d6-a1a9-ea837hf7h57fa" Volume is already exclusively attached to one node and can't be attached to another

Make sure that your EBS controller pods have connectivity to the EC2 API

If you see errors indicating connection timeouts in the EBS-CSI-Controller logs, then the EBS CSI controller might not be connected to the EC2 API. If the controller pods have connectivity issues to the EC2 API, then you see an error similar to the following when creating your PVC:

Warning   ProvisioningFailed       persistentvolumeclaim/storage-volume-1   failed to provision volume with StorageClass "ebs-sc": rpc error: code = DeadlineExceeded desc = context deadline exceeded

To correct this error, verify the EBS controller pods' subnets have connectivity to EC2 API. If you're running a private cluster with an HTTP/HTTPS proxy, verify that your EBS CSI controller pods are configured to use the HTTP/HTTPS proxy. EBS CSI driver's helm installation supports setup of an HTTP/HTTPS proxy.


AWS OFFICIAL
AWS OFFICIALUpdated 7 months ago