为什么我无法在 Amazon EKS 中运行 kubectl 命令?

3 分钟阅读
0

我无法在 Amazon Elastic Kubernetes Service(Amazon EKS)中成功运行 kubectl 命令,例如 kubectl exec、kubectl logs、kubectl attach 或 kubectl port-forward。

解决方法

通常,kubectl 命令会在 Amazon EKS 集群中失败,因为 API 服务器未与在 Worker 节点上运行的 kubelet 通信。常见的 kubectl 命令包括 kubectl execkubectl logskubectl attachkubectl port-forward

要排查此问题,请验证以下事项:

  • Pod 正在辅助无类别域间路由(CIDR)范围内运行。
  • 用于控制面板和节点的安全组会使用入站和出站规则的最佳实践。
  • aws-auth ConfigMap 具有正确的 AWS Identity and Access Management(IAM)角色以及与节点关联的 Kubernetes 用户名。
  • 已满足提交新证书的要求。

Pod 正在辅助无类别域间路由(CIDR)范围内运行

创建集群后,Amazon EKS 无法立即与从 CIDR 块在子网中启动且添加到虚拟私有云(VPC)的节点进行通信。向现有集群添加 CIDR 块所导致的更新范围可能需要长达五个小时才能被 Amazon EKS 识别。有关更多信息,请参阅 Amazon EKS VPC 和子网的要求与注意事项

如果 Pod 正在辅助 CIDR 范围内运行,请执行以下操作:

  • 最多等待五个小时,这些命令才能开始工作。
  • 确保每个子网中至少有五个空闲 IP 地址,以便成功完成自动化。

使用以下示例策略查看任何 VPC 中所有子网的可用 IP 地址:

[ec2-user@ip-172-31-51-214 ~]$ aws ec2 describe-subnets --filters "Name=vpc-id,Values=vpc-078af71a874f2f068" | jq '.Subnets[] | .SubnetId + "=" + "\(.AvailableIpAddressCount)"'
"subnet-0d89886ca3fb30074=8186"
"subnet-0ee46aa228bdc9a74=8187"
"subnet-0a0186a277b8b6a51=8186"
"subnet-0d1fb1de0732b5766=8187"
"subnet-077eff87a4e25316d=8187"
"subnet-0f01c02b04708f638=8186"

用于控制面板和节点的安全组具有最低要求的入站和出站规则

在 Worker 节点上运行时,API 服务器至少须具有最低要求的入站和出站规则才能对 kubelet 进行 API 调用。要验证用于控制面板和节点安全组是否具有最低要求的入站和出站规则,请参阅 Amazon EKS 安全组要求和注意事项

aws-auth ConfigMap 具有正确的 IAM 角色以及与节点关联的 Kubernetes 用户名

您必须将正确的 IAM 角色应用到 aws-auth ConfigMap。确保 IAM 角色具有与您的节点关联的 Kubernetes 用户名。要将 aws-auth ConfigMap 应用到集群,请参阅将 IAM 用户或角色添加到 Amazon EKS 集群

已满足提交新证书的要求

Amazon EKS 集群需要节点的 kubelet 来为自己提交和轮换服务证书。当服务证书不可用时,会发生证书错误。

1.    运行以下命令以验证 kubelet 服务器证书:

cd /var/lib/kubelet/pki/# use openssl command to validate kubelet server cert 
sudo openssl x509 -text -noout -in kubelet-server-current.pem

输出与以下内容类似:

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            1e:f1:84:62:a3:39:32:c7:30:04:b5:cf:b0:91:6e:c7:bd:5d:69:fb
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN=kubernetes
        Validity
            Not Before: Oct 11 19:03:00 2021 GMT
            Not After : Oct 11 19:03:00 2022 GMT
        Subject: O=system:nodes, CN=system:node:ip-192-168-65-123.us-east-2.compute.internal
        Subject Public Key Info:
            Public Key Algorithm: id-ecPublicKey
                Public-Key: (256 bit)
                pub:
                    04:7f:44:c6:95:7e:0f:1e:f8:f8:bf:2e:f8:a9:40:
                    6a:4f:83:0a:e8:89:7b:87:cb:d6:b8:47:4e:8d:51:
                    00:f4:ac:9d:ef:10:e4:97:4a:1b:69:6f:2f:86:df:
                    e0:81:24:c6:62:d2:00:b8:c7:60:da:97:db:da:b7:
                    c3:08:20:6e:70
                ASN1 OID: prime256v1
                NIST CURVE: P-256
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage:
                TLS Web Server Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Subject Key Identifier:
                A8:EA:CD:1A:5D:AB:DC:47:A0:93:31:59:ED:05:E8:7E:40:6D:ED:8C
            X509v3 Authority Key Identifier:
                keyid:2A:F2:F7:E8:F6:1F:55:D1:74:7D:59:94:B1:45:23:FD:A1:8C:97:9B

            X509v3 Subject Alternative Name:
                DNS:ec2-3-18-214-69.us-east-2.compute.amazonaws.com, DNS:ip-192-168-65-123.us-east-2.compute.internal, IP Address:192.168.65.123, IP Address:3.18.214.69

2.    查看 kubelet 日志中是否存在证书错误。如果您没有看到错误,则表示已满足提交新证书的要求。

kubelet 日志证书错误示例:

kubelet[8070]: I1021 18:49:21.594143 8070 log.go:184] http: TLS handshake error from 192.168.130.116:38710: no serving certificate available for the kubelet

**注意:**有关更详细的日志,请打开标记为 --v=4 的 kubelet 详细日志,然后在 Worker 节点上重新启动 kubelet。kubelet 详细日志与以下内容类似:

#kubelet verbosity can be increased by updating this file ...max verbosisty level --v=4
sudo vi /etc/systemd/system/kubelet.service.d/10-kubelet-args.conf
# Normal kubelet verbosisty is 2 by default
cat /etc/systemd/system/kubelet.service.d/10-kubelet-args.conf
[Service]
Environment='KUBELET_ARGS=--node-ip=192.168.65.123 --pod-infra-container-image=XXXXXXXXXX.dkr.ecr.us-east-2.amazonaws.com/eks/pause:3.1-eksbuild.1 --v=2
#to restart the demon and kubelet
sudo systemctl daemon-reload
sudo systemctl restart kubelet
#make sure kubelet in running state
sudo systemctl status kubelet
# to stream logs for kubelet
journalctl -u kubelet -f

3.    如果您看到错误,请验证 Worker 节点上的 kubelet 配置文件:/etc/kubernetes/kubelet/kubelet-config.json,然后确认 RotateKubeletServerCertificateserverTLSBootstrap 标志已列为 True:

"featureGates": {
 "RotateKubeletServerCertificate": true
},
"serverTLSBootstrap": true,

4.    运行以下 eks:node-bootstrapper 命令,以确认 kubelet 具有提交证书签名请求(CSR)所需的基于角色的访问控制(RBAC)系统权限:

$ kubectl get clusterrole eks:node-bootstrapper -o yaml
apiVersion: rbac.authorization.k8s.io/v1

输出与以下内容类似:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRole","metadata":{"annotations":{},"labels":{"eks.amazonaws.com/component":"node"},"name":"eks:node-bootstrapper"},"rules":[{"apiGroups":["certificates.k8s.io"],"resources":["certificatesigningrequests/selfnodeserver"],"verbs":["create"]}]}
  creationTimestamp: "2021-11-09T10:07:42Z"
  labels:
    eks.amazonaws.com/component: node
  name: eks:node-bootstrapper
  resourceVersion: "199"
  uid: da268bf3-31a3-420a-9a71-414229437b7e
rules:
- apiGroups:
  - certificates.k8s.io
  resources:
  - certificatesigningrequests/selfnodeserver
  verbs:
  - create

所需的 RBAC 权限包括以下属性:

- apiGroups: ["certificates.k8s.io"]
resources: ["certificatesigningrequests/selfnodeserver"]
verbs: ["create"]

5.    运行以下命令以检查集群角色 eks:node-bootstrapper 是否绑定到 system:bootstrapperssystem:nodes。这能让 kubelet 为自己提交和轮换服务证书。

$ kubectl get clusterrolebinding eks:node-bootstrapper -o yaml
apiVersion: rbac.authorization.k8s.io/v1

输出与以下内容类似:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRoleBinding","metadata":{"annotations":{},"labels":{"eks.amazonaws.com/component":"node"},"name":"eks:node-bootstrapper"},"roleRef":{"apiGroup":"rbac.authorization.k8s.io","kind":"ClusterRole","name":"eks:node-bootstrapper"},"subjects":[{"apiGroup":"rbac.authorization.k8s.io","kind":"Group","name":"system:bootstrappers"},{"apiGroup":"rbac.authorization.k8s.io","kind":"Group","name":"system:nodes"}]}
  creationTimestamp: "2021-11-09T10:07:42Z"
  labels:
    eks.amazonaws.com/component: node
  name: eks:node-bootstrapper
  resourceVersion: "198"
  uid: f6214fe0-8258-4571-a7b9-ff3455add7b9
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: eks:node-bootstrapper
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:bootstrappers
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:nodes

AWS 官方
AWS 官方已更新 1 年前