CoreDNS issues after upgrading to 1.29

0

We recently upgraded from version 1.25 to 1.29. Up until version 1.28, we didn't have any issues. However, after upgrading to version 1.29, our applications suddenly started throwing errors stating that they can't resolve domain names.

Upon inspecting the logs, we discovered that CoreDNS had encountered some issues. Somehow, with EKS version 1.29, it can't reach the Kubernetes endpoint. We are using Terraform to update the cluster.

Is there something that changed with EKS version 1.29 related to CoreDNS, CNI, or networking in general? Everything worked fine up to version 1.28. Apart from changing the version in Terraform, all configurations remain the same.

I have included the CoreDNS logs for both version 1.28 and 1.29. Any help would be greatly appreciated.

CoreDNS logs with EKS 1.28

.:53
xxxxx.lan.:53
xxxxx.internal.:53
xxxxx.internal.:53
xxxxx.cloud.:53
[INFO] plugin/reload: Running configuration SHA512 = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
CoreDNS-1.10.1
linux/amd64, go1.21.5, 34742fdd
[INFO] 172.31.141.181:51238 - 25614 "A IN domain1.c.xxxxx.internal. udp 70 false 512" NOERROR qr,rd,ra 138 0.014623223s
[INFO] 172.31.141.181:51238 - 5896 "AAAA IN domain2.c.xxxxx.internal. udp 70 false 512" NOERROR qr,rd,ra 175 2.018777708s
[INFO] 172.31.141.181:38271 - 10889 "A IN domain3.c.xxxxx.internal. udp 70 false 512" NOERROR qr,rd,ra 138 0.013623023s

CoreDNS logs with EKS 1.29

[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[WARNING] plugin/kubernetes: starting server with unsynced Kubernetes API
CoreDNS-1.11.1
linux/amd64, go1.21.5, e8fa22a0
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.100.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.100.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1010975873]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (26-Feb-2024 15:40:26.542) (total time: 30000ms):
Trace[1010975873]: ---"Objects listed" error:Get "https://10.100.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.100.0.1:443: i/o timeout 30000ms (15:40:56.543)
Trace[1010975873]: [30.000813318s] [30.000813318s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.100.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.100.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Service: Get "https://10.100.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.100.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1482381721]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (26-Feb-2024 15:40:26.541) (total time: 30004ms):
Trace[1482381721]: ---"Objects listed" error:Get "https://10.100.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.100.0.1:443: i/o timeout 30004ms (15:40:56.546)
Trace[1482381721]: [30.004177736s] [30.004177736s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.100.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.100.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Namespace: Get "https://10.100.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.100.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[748359720]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (26-Feb-2024 15:40:26.541) (total time: 30005ms):
Trace[748359720]: ---"Objects listed" error:Get "https://10.100.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.100.0.1:443: i/o timeout 30005ms (15:40:56.547)
Trace[748359720]: [30.005153615s] [30.005153615s] END
Rajat
asked 2 months ago112 views
1 Answer
1

he error message "i/o timeout" suggests that CoreDNS is unable to establish a connection to the Kubernetes API server within the specified timeout period.

Here are some steps you can take to troubleshoot and resolve this issue:

  • Check Kubernetes API Server: Ensure that the Kubernetes API server is running and accessible from within your EKS cluster. You can verify this by connecting to one of the worker nodes in your cluster and attempting to access the Kubernetes API using kubectl or curl.
  • Verify Network Configuration: Review the network configuration of your EKS cluster to ensure that there are no issues with networking or DNS resolution. Check any network policies, security groups, or VPC settings that may be affecting communication between CoreDNS and the Kubernetes API server.
  • Check CoreDNS Configuration: Review the CoreDNS configuration to ensure that it is correctly configured to communicate with the Kubernetes API server. Check for any misconfigurations or typos in the CoreDNS configuration file (Corefile) that may be causing the issue.
  • Restart CoreDNS Pods: Try restarting the CoreDNS pods in your EKS cluster to see if that resolves the issue. You can do this by deleting the CoreDNS pods, which will cause them to be automatically recreated by the Kubernetes scheduler.
  • Check Kubernetes API Server Logs: Review the logs of the Kubernetes API server to see if there are any errors or issues reported that may be related to CoreDNS communication. Look for any indications of network problems or connectivity issues.
  • Upgrade CoreDNS: If you are using a version of CoreDNS that is not compatible with EKS 1.29, consider upgrading to a newer version of CoreDNS that is compatible with the Kubernetes API server in your EKS cluster.
profile picture
EXPERT
answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions