如何解决 Amazon EKS 中的容器组(pod)状态 ErrImagePull 和 ImagePullBackoff 错误?

5 分钟阅读
0

我的 Amazon Elastic Kubernetes Service (Amazon EKS) 的容器组(pod)状态处于 ErrimagePull 或 ImagePullBackoff 状态。

简短描述

如果您运行 kubectl 命令 get pods,并且您的容器组(pod)处于 ImagePullBackOff 状态,那么容器组(pod)没有正确运行。ImagePullBackOff 状态意味着容器无法启动,因为无法检索或提取图像。有关更多信息,请参阅 Amazon EKS 连接器容器组(pod)处于 ImagePullBackoff 状态。

在以下情况下,您可能会收到 ImagePull 错误:

  • 图像名称、标签或摘要不正确。
  • 图像需要凭证才能进行身份验证。
  • 注册表不可访问。

解决方法

1.检查容器组(pod)状态、错误消息,并验证图像名称、标签和 SHA 是否正确

要获取容器组(pod)的状态,请运行 kubectl 命令 get pods

$ kubectl get pods -n default
NAME                              READY   STATUS             RESTARTS   AGE
nginx-7cdbb5f49f-2p6p2            0/1     ImagePullBackOff   0          86s

要获取容器组(pod)错误消息的详细信息,请运行 kubectl 命令 describe pod

$ kubectl describe pod nginx-7cdbb5f49f-2p6p2

...
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  4m23s                 default-scheduler  Successfully assigned default/nginx-7cdbb5f49f-2p6p2 to ip-192-168-149-143.us-east-2.compute.internal
  Normal   Pulling    2m44s (x4 over 4m9s)  kubelet            Pulling image "nginxx:latest"
  Warning  Failed     2m43s (x4 over 4m9s)  kubelet            Failed to pull image "nginxx:latest": rpc error: code = Unknown desc = Error response from daemon: pull access denied for nginxx, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
  Warning  Failed     2m43s (x4 over 4m9s)  kubelet            Error: ErrImagePull
  Warning  Failed     2m32s (x6 over 4m8s)  kubelet            Error: ImagePullBackOff
  Normal   BackOff    2m17s (x7 over 4m8s)  kubelet            Back-off pulling image "nginxx:latest"

$ kubectl 描述了容器组(pod)nginx-55d75d5f56-qrqmp...Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 2m20s default-scheduler Successfully assigned default/nginx-55d75d5f56-qrqmp to ip-192-168-149-143.us-east-2.compute.internal Normal Pulling 40s (x4 over 2m6s) kubelet Pulling image "nginx:latestttt" Warning Failed 39s (x4 over 2m5s) kubelet Failed to pull image "nginx:latestttt": rpc error: code = Unknown desc = Error response from daemon: manifest for nginx:latestttt not found: manifest unknown: manifest unknown Warning Failed 39s (x4 over 2m5s) kubelet Error: ErrImagePull Warning Failed 26s (x6 over 2m5s) kubelet Error: ImagePullBackOff Normal BackOff 11s (x7 over 2m5s) kubelet Back-off pulling image "nginx:latestttt" 确保您的图像标签和名称存在且正确。如果图像注册表需要身份验证,请确保您已获得访问权限。要验证容器组(pod)中使用的图像是否正确,请运行以下命令:

$ kubectl get pods nginx-7cdbb5f49f-2p6p2  -o jsonpath="{.spec.containers[*].image}" | \sort

nginxx:latest

要了解容器状态值,请参阅 Kubernetes 网站上的容器组(pod)阶段

有关更多信息,请参阅如何在 Amazon EKS 中排除容器组(pod)状态问题?

2.Amazon Elastic Container Registry (Amazon ECR) 图像

如果您尝试使用 Amazon EKS 从 Amazon ECR 提取图像,则可能需要额外的配置。如果您的图像存储在 Amazon ECR 私有注册表中,请确保您在容器组(pod)上指定了凭证 imagePullSecrets。这些凭证用于向私有注册表进行身份验证。

创建一个名为 regcredSecret(机密):

kubectl create secret docker-registry regcred --docker-server=<your-registry-server> --docker-username=<your-name> --docker-password=<your-pword> --docker-email=<your-email>

请务必替换以下凭证:

  • <your-registry-server> 是您的 Private Docker Registry FQDN。使用 https://index.docker.io/v1/ 用于 DockerHub。
  • <your-name> 是您的 Docker 用户名。
  • <your-pword> 是您的 Docker 密码。
  • <your-email> 是您的 Docker 电子邮件。

您已经成功地将集群中的 Docker 凭证设置为名为 regcred 的 Secret(机密)。

要理解 regcred Secret(机密)的内容,请查看 YAML 格式的密钥:

kubectl get secret regcred --output=yaml

在以下示例中,容器组(pod)需要在 regcred 中访问您的 Docker 凭证:

apiVersion: v1
kind: Pod
metadata:
  name: private-reg
spec:
  containers:
  - name: private-reg-container
    image: <your-private-image>
  imagePullSecrets:
  - name: regcred

your.private.registry.example 替换为私有注册表中图像的路径,如下所示:

your.private.registry.example.com/bob/bob-private:v1

要从私有注册表提取图像,Kubernetes 需要凭证。配置文件中的 imagePullSecrets 字段指定 Kubernetes 必须从名为 regcred 的 Secret(机密)获取凭证。

有关创建 Secret(机密)的更多选项,请参阅在 Kubernetes 网站上创建使用 Secret(机密)提取图像的容器组(pod)

3.注册表故障排除

在以下示例中,由于网络连接问题,注册表无法访问,因为 kubelet 无法访问私有注册表端点:

$ kubectl describe pods nginx-9cc69448d-vgm4m

...
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  16m                default-scheduler  Successfully assigned default/nginx-9cc69448d-vgm4m to ip-192-168-149-143.us-east-2.compute.internal
  Normal   Pulling    15m (x3 over 16m)  kubelet            Pulling image "nginx:stable"
  Warning  Failed     15m (x3 over 16m)  kubelet            Failed to pull image "nginx:stable": rpc error: code = Unknown desc = Error response from daemon: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Warning  Failed     15m (x3 over 16m)  kubelet            Error: ErrImagePull
  Normal   BackOff    14m (x4 over 16m)  kubelet            Back-off pulling image "nginx:stable"
  Warning  Failed     14m (x4 over 16m)  kubelet            Error: ImagePullBackOff

错误讯息 "Failed to pull image..."(无法提取图像)表示 kubelet 尝试连接到 Docker 注册表端点,但由于连接超时而失败。

要解决此错误,请检查您的子网、安全组和允许与指定注册表端点通信的网络 ACL。

在以下示例中,已超过注册表速率限制:

$ kubectl describe pod nginx-6bf9f7cf5d-22q48

...
Events:
  Type     Reason                  Age                   From               Message
  ----     ------                  ----                  ----               -------
  Normal   Scheduled               3m54s                 default-scheduler  Successfully assigned default/nginx-6bf9f7cf5d-22q48 to ip-192-168-153-54.us-east-2.compute.internal
  Warning  FailedCreatePodSandBox  3m33s                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "82065dea585e8428eaf9df89936653b5ef12b53bef7f83baddb22edc59cd562a" network for pod "nginx-6bf9f7cf5d-22q48": networkPlugin cni failed to set up pod "nginx-6bf9f7cf5d-22q48_default" network: add cmd: failed to assign an IP address to container
  Warning  FailedCreatePodSandBox  2m53s                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "20f2e27ba6d813ffc754a12a1444aa20d552cc9d665f4fe5506b02a4fb53db36" network for pod "nginx-6bf9f7cf5d-22q48": networkPlugin cni failed to set up pod "nginx-6bf9f7cf5d-22q48_default" network: add cmd: failed to assign an IP address to container
  Warning  FailedCreatePodSandBox  2m35s                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "d9b7e98187e84fed907ff882279bf16223bf5ed0176b03dff3b860ca9a7d5e03" network for pod "nginx-6bf9f7cf5d-22q48": networkPlugin cni failed to set up pod "nginx-6bf9f7cf5d-22q48_default" network: add cmd: failed to assign an IP address to container
  Warning  FailedCreatePodSandBox  2m                    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "c02c8b65d7d49c94aadd396cb57031d6df5e718ab629237cdea63d2185dbbfb0" network for pod "nginx-6bf9f7cf5d-22q48": networkPlugin cni failed to set up pod "nginx-6bf9f7cf5d-22q48_default" network: add cmd: failed to assign an IP address to container
  Normal   SandboxChanged          119s (x4 over 3m13s)  kubelet            Pod sandbox changed, it will be killed and re-created.
  Normal   Pulling                 56s (x3 over 99s)     kubelet            Pulling image "httpd:latest"
  Warning  Failed                  56s (x3 over 99s)     kubelet            Failed to pull image "httpd:latest": rpc error: code = Unknown desc = Error response from daemon: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit
  Warning  Failed                  56s (x3 over 99s)     kubelet            Error: ErrImagePull
  Normal   BackOff                 43s (x4 over 98s)     kubelet            Back-off pulling image "httpd:latest"
  Warning  Failed                  43s (x4 over 98s)     kubelet            Error: ImagePullBackOff

对于匿名使用,Docker 注册表速率限制为每六小时 100 个容器图像请求,Docker 账户的速率限制为每六小时 200 个容器图像请求。超过这些限制的图像请求将被拒绝访问,直到六小时窗口过去。要管理使用情况并了解注册费率限制,请参阅 Docker 网站上的了解您的 Docker Hub 费率限制


相关信息

Amazon EKS 问题排查

如何排查 Amazon ECR 用于 Amazon EKS 时出现的问题?

Amazon EKS 的安全最佳实践

AWS 官方
AWS 官方已更新 1 年前