EKS Auto Mode - metrics server

0

Hi,

I launched an auto mode cluster, and there was 1 add-on: metrics server. Everything was as basic as possible, using "Quick Configuration (with EKS Auto Mode) - new"

The metrics server failed to launch, with these errors:

  Warning  FailedCreatePodSandBox  12m                    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "443df11c5559a82faf322656c5ad9631128288f8eb7462f039cffb6776ed491a": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: Error received from AddNetwork gRPC call: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:50051: connect: connection refused"
  Warning  Failed                  11m                    kubelet            Failed to pull image "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/metrics-server:v0.7.2-eksbuild.2": rpc error: code = DeadlineExceeded desc = failed to pull and unpack image "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/metrics-server:v0.7.2-eksbuild.2": failed to resolve reference "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/metrics-server:v0.7.2-eksbuild.2": failed to do request: Head "https://602401143452.dkr.ecr.us-east-1.amazonaws.com/v2/eks/metrics-server/manifests/v0.7.2-eksbuild.2": dial tcp 54.235.249.60:443: i/o timeout

Repeating the process on another cluster. Same result.

Question 1: any ideas about the metrics server errors?

Question 2: This leads to another topic. The above errors might be correlated with the VPC CNI plugin. With that in mind, "auto mode" automatically provides the CNI plugin, right? Yet the console still allows me to choose kube-proxy, core-dns, and the cni plugin as "add-ons". I did not select them. If those services are automatically installed by auto-mode, then shouldn't the adds-on in the console be permanently disabled (as long as auto-mode is enabled, that is)? What would happen if they run as both add-ons + auto-mode at the same time?

asked 2 months ago275 views
2 Answers
0

Regarding your questions about EKS Auto Mode and the metrics server errors:

  1. The metrics server errors you're experiencing could be related to networking issues within your cluster. The error messages suggest problems with creating pod sandboxes and pulling images, which might be caused by misconfigured networking or issues with the CNI plugin. Some potential causes could be:

    • Incorrect VPC or subnet configuration
    • Issues with the AWS CNI plugin
    • Network connectivity problems between nodes and the ECR repository

To troubleshoot, you could try:

  • Checking your VPC and subnet configurations
  • Verifying that your nodes have the necessary permissions to pull images from ECR
  • Ensuring that your nodes can communicate with the Kubernetes API server
  1. Regarding the CNI plugin and add-ons in EKS Auto Mode:

    Yes, EKS Auto Mode automatically provides and manages core add-ons, including the CNI plugin, kube-proxy, and CoreDNS. These are considered part of the managed infrastructure in Auto Mode.

    The console still allowing you to choose these add-ons as optional might be a user interface inconsistency. In theory, these should indeed be disabled or shown as already managed when using Auto Mode.

    If you were to manually install these add-ons in addition to what Auto Mode provides, it could potentially lead to conflicts or unexpected behavior. It's best to rely on Auto Mode's management of these core components and avoid manual installation of these specific add-ons when using Auto Mode.

    The ability to still see and potentially select these add-ons in the console when using Auto Mode might be an oversight in the user interface design. It's generally recommended to let Auto Mode handle these core components without additional manual intervention.

Sources
Managed Kubernetes – Amazon Elastic Kubernetes Service (EKS) Auto Mode - AWS
Automate cluster infrastructure with EKS Auto Mode - Amazon EKS
EKS Auto Mode starts with 0 nodes & can't schedule new nodes | AWS re:Post

profile picture
answered 2 months ago
0

This is probably the answer:

https://docs.aws.amazon.com/eks/latest/userguide/network-reqs.html

  • If you plan to deploy nodes to a public subnet, the subnet must auto-assign IPv4 public addresses or IPv6 addresses.

  • If the subnet that you deploy a node to is a private subnet and its route table doesn’t include a route to a network address translation (NAT) device (IPv4) or an egress-only gateway (IPv6), add VPC endpoints using AWS PrivateLink to your VPC.

I created a VPC using default settings. None of the above was configured. So it was missing.

In the EKS console, there are various dashboards. All of them show green. See image below.

Enter image description here

If you go to View Dashboard -> Node Health Issues. No problems.
Likewise on "Cluster Insights", "Cluster Health Issues".
The "Networking" tab doesn't show any problems.
Meanwhile, the VPC and VPC subnets were standard, normal, default. Provided by AWS out-of-the-box.

If a user just does the obvious default thing, when it's called "auto mode", then hopefully it would "just work", or otherwise the issue should be called to the user's attention. There are 10 or more health checks in the EKS console that are all green, and everything says "success", and yet this is an error. How about a pre-flight check someplace that warns you "The node can't access the internet" "The node can't reach ECR - and here's why. You need to set auto-assign IPv4 address ". Well, if it's auto-mode, why doesn't it assign an IPv4 address, instead of depending on the subnet. Actually, auto-mode seems to prefer private subnets. That is more complicated, requiring a NAT device. Not sure the best choice, but maybe show instructive health-check messages that direct the user to configure it, if it's not feasible to take action on behalf of the user in this case.

answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions