We recently upgraded from version 1.25 to 1.29. Up until version 1.28, we didn't have any issues. However, after upgrading to version 1.29, our applications suddenly started throwing errors stating that they can't resolve domain names.
Upon inspecting the logs, we discovered that CoreDNS had encountered some issues. Somehow, with EKS version 1.29, it can't reach the Kubernetes endpoint. We are using Terraform to update the cluster.
Is there something that changed with EKS version 1.29 related to CoreDNS, CNI, or networking in general? Everything worked fine up to version 1.28. Apart from changing the version in Terraform, all configurations remain the same.
I have included the CoreDNS logs for both version 1.28 and 1.29. Any help would be greatly appreciated.
CoreDNS logs with EKS 1.28
.:53
xxxxx.lan.:53
xxxxx.internal.:53
xxxxx.internal.:53
xxxxx.cloud.:53
[INFO] plugin/reload: Running configuration SHA512 = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
CoreDNS-1.10.1
linux/amd64, go1.21.5, 34742fdd
[INFO] 172.31.141.181:51238 - 25614 "A IN domain1.c.xxxxx.internal. udp 70 false 512" NOERROR qr,rd,ra 138 0.014623223s
[INFO] 172.31.141.181:51238 - 5896 "AAAA IN domain2.c.xxxxx.internal. udp 70 false 512" NOERROR qr,rd,ra 175 2.018777708s
[INFO] 172.31.141.181:38271 - 10889 "A IN domain3.c.xxxxx.internal. udp 70 false 512" NOERROR qr,rd,ra 138 0.013623023s
CoreDNS logs with EKS 1.29
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[WARNING] plugin/kubernetes: starting server with unsynced Kubernetes API
CoreDNS-1.11.1
linux/amd64, go1.21.5, e8fa22a0
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.100.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.100.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1010975873]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (26-Feb-2024 15:40:26.542) (total time: 30000ms):
Trace[1010975873]: ---"Objects listed" error:Get "https://10.100.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.100.0.1:443: i/o timeout 30000ms (15:40:56.543)
Trace[1010975873]: [30.000813318s] [30.000813318s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.100.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.100.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Service: Get "https://10.100.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.100.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1482381721]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (26-Feb-2024 15:40:26.541) (total time: 30004ms):
Trace[1482381721]: ---"Objects listed" error:Get "https://10.100.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.100.0.1:443: i/o timeout 30004ms (15:40:56.546)
Trace[1482381721]: [30.004177736s] [30.004177736s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.100.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.100.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Namespace: Get "https://10.100.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.100.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[748359720]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (26-Feb-2024 15:40:26.541) (total time: 30005ms):
Trace[748359720]: ---"Objects listed" error:Get "https://10.100.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.100.0.1:443: i/o timeout 30005ms (15:40:56.547)
Trace[748359720]: [30.005153615s] [30.005153615s] END