How do I troubleshoot DNS failures with Amazon EKS?
The applications or Pods that use CoreDNS in my Amazon Elastic Kubernetes Service (Amazon EKS) cluster fail internal or external DNS name resolution.
Short description
Pods that run inside the Amazon EKS cluster use the CoreDNS cluster IP address as the name server to query internal and external DNS records. If there are issues with the CoreDNS Pods, service configuration, or connectivity, then applications might fail DNS resolution.
The kube-dns service object abstracts the CoreDNS Pods. To troubleshoot issues with your CoreDNS Pods, verify the working status of all the kube-dns service components, such as service endpoint options and iptables rules.
Resolution
Note: In the following resolution, the CoreDNS ClusterIP value is 10.100.0.10.
To check your DNS configuration, complete the following steps:
-
To get the ClusterIP of your CoreDNS service, run the following command:
kubectl get service kube-dns -n kube-system -
To verify that the DNS endpoints are exposed and point to the CoreDNS Pods, run the following command:
kubectl -n kube-system get endpoints kube-dnsExample output:
NAME ENDPOINTS AGE kube-dns 192.168.2.218:53,192.168.3.117:53,192.168.2.218:53 + 1 more... 90dNote: If the endpoint list is empty, then check the Pod status of the CoreDNS Pods.
-
Confirm that your security groups and network access control list (network ACL) don't block the Pods when they communicate with CoreDNS.
For more information, see Why won't my pods connect to other pods in Amazon EKS?
Verify that the kube-proxy Pod works
To check whether the kube-proxy Pod has access to API servers for your cluster, check your logs for timeout errors to the control plane. Also, check for 403 unauthorized errors.
To get the kube-proxy logs, run the following command:
kubectl logs -n kube-system --selector 'k8s-app=kube-proxy'
Note: The kube-proxy gets the endpoints from the control plane and creates the iptables rules on each node.
Check the CoreDNS Pod CPU usage at the time of the issue
The Amazon EKS CoreDNS add-on adds only the 170 MiB quota to the CoreDNS Pod's memory. The CoreDNS Pod doesn't define a CPU quota, so the container can use all the available CPU resources on the node where it runs. If the node's CPU utilization is at 100%, then you might get DNS timeout errors in your Amazon EKS application logs. This is because the CoreDNS pod doesn't have enough CPU resources to manage all DNS queries.
To check the current CPU and memory usage of the CoreDNS Pods, run the following command:
kubectl top pods -n kube-system -l k8s-app=kube-dns
To check the current CPU and memory usage of the Amazon EKS cluster nodes, run the following command:
kubectl top nodes
Connect to the application Pod to troubleshoot the DNS issue
Complete the following steps:
-
To run commands inside your application Pods, run the following command:
kubectl exec -it your-pod-name -- shNote: Replace your-pod-name with your Pod name.
The preceding command allows you to access a shell inside the running Pod. If the application pod doesn't have an available shell binary, then you receive an error similar to the following example:
"OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "exec: \"sh\": executable file not found in $PATH": unknown command terminated with exit code 126"
To resolve this issue, update the image that you use in your pod-manifest.yaml manifest file with another image. An example image is busybox on the Docker website. -
To verify that the kube-dns service's cluster IP address is in your Pod's /etc/resolv.conf file, run the following command in the Pod shell:
cat /etc/resolv.confThe following example resolv.conf file shows a pod that's configured to point to 10.100.0.10 for DNS requests. The IP address must match the ClusterIP value of your kube-dns service:
nameserver 10.100.0.10 search default.svc.cluster.local svc.cluster.local cluster.local ec2.internal options ndots:5Note: You can manage your Pod's DNS configuration with the dnsPolicy field in the Pod specification. If you don't populate this field, then Amazon EKS uses the ClusterFirst DNS policy by default. For more information about the ClusterFirst DNS policy, see Pod's DNS policy on the Kubernetes website.
-
To verify that your Pod can use the default ClusterIP value to resolve an internal domain, run the following command in the Pod shell:
nslookup kubernetes.default 10.100.0.10Example output:
Server: 10.100.0.10 Address: 10.100.0.10#53 Name: kubernetes.default.svc.cluster.local Address: 10.100.0.1 -
To verify that your Pod can use the default ClusterIP value to resolve an external domain, run the following command in the Pod shell:
nslookup amazon.com 10.100.0.10Example output:
Server: 10.100.0.10 Address: 10.100.0.10#53 Non-authoritative answer: Name: amazon.com Address: 176.32.98.166 Name: amazon.com Address: 205.251.242.103 Name: amazon.com Address: 176.32.103.205 -
To get the kube-dns endpoints, run the following command:
kubectl get endpoints kube-dns -n kube-system -
To verify that your Pod can use the CoreDNS Pod IP address to resolve directly, run the following command in the Pod shell:
nslookup kubernetes COREDNS_POD_IP nslookup amazon.com COREDNS_POD_IPNote: Replace COREDNS_POD_IP with the kube-dns endpoint IP addresses.
Get more detailed logs from CoreDNS Pods to debug further issues
Complete the following steps:
- To activate the CoreDNS Pod debug log and add the log plugin to the CoreDNS ConfigMap, run the following command:
Note: For more information, see log on the CoreDNS website.kubectl -n kube-system edit configmap coredns - In the command output's editor screen, add the following log string:
Note: It takes several minutes to reload the CoreDNS configuration. To immediately apply the changes, restart the Pods one by one.kind: ConfigMap apiVersion: v1 data: Corefile: | .:53 { log # Activating CoreDNS Logging errors health kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure upstream fallthrough in-addr.arpa ip6.arpa } ... ... - To check whether the CoreDNS logs fail or get traffic from the application Pod, run the following command:
kubectl logs --follow -n kube-system --selector 'k8s-app=kube-dns'
Update the ndots value
The ndots value is the number of dots that must appear in a domain name to resolve a query before the initial absolute query. For example, you can set ndots to the default 5 in a domain name that's not fully qualified. In this scenario, all external domains that aren't under the cluster.local internal domain append to the search domains before they query.
The following example has the /etc/resolv.conf file setting of the application Pod:
nameserver 10.100.0.10search default.svc.cluster.local svc.cluster.local cluster.local ec2.internal options ndots:5
In the preceding example configuration, CoreDNS looks for five dots in the queried domain. If the Pod makes a DNS resolution call for amazon.com, then your logs look similar to the following example:
[INFO] 192.168.3.71:33238 - 36534 "A IN amazon.com.default.svc.cluster.local. udp 54 false 512" NXDOMAIN qr,aa,rd 147 0.000473434s[INFO] 192.168.3.71:57098 - 43241 "A IN amazon.com.svc.cluster.local. udp 46 false 512" NXDOMAIN qr,aa,rd 139 0.000066171s [INFO] 192.168.3.71:51937 - 15588 "A IN amazon.com.cluster.local. udp 42 false 512" NXDOMAIN qr,aa,rd 135 0.000137489s [INFO] 192.168.3.71:52618 - 14916 "A IN amazon.com.ec2.internal. udp 41 false 512" NXDOMAIN qr,rd,ra 41 0.001248388s [INFO] 192.168.3.71:51298 - 65181 "A IN amazon.com. udp 28 false 512" NOERROR qr,rd,ra 106 0.001711104s
Note: NXDOMAIN means that the Pod didn't find the domain record. NOERROR means that the Pod successfully found the domain record.
Each search domain has the amazon.com prefix before it makes the final call on the absolute domain that's at the end. A final domain name that you append with a dot (.) at the end is a fully qualified domain name. For each external domain name query, there might be four or five additional calls that can overwhelm the CoreDNS Pod.
To resolve this issue, change ndots to 1 to look for only one dot. Or, append a dot at the end of the domain that you query or use. Example:
nslookup example.com.
Check the AmazonProvidedDNS VPC resolver quotas
The Amazon Virtual Private Cloud (Amazon VPC) resolver can accept a maximum quota of 1024 packets in one second for each elastic network interface. If more than one CoreDNS Pod is on the same node, then you might reach this quota for external domain queries.
To use PodAntiAffinity rules to schedule CoreDNS Pods on separate instances, add the following options to the CoreDNS deployment:
podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchExpressions: - key: k8s-app operator: In values: - kube-dns topologyKey: kubernetes.io/hostname weight: 100
Note: For more information about PodAntiAffinity, see Inter-pod affinity and anti-affinity on the Kubernetes website.
Use tcpdump to capture CoreDNS packets from Amazon EKS worker nodes
To diagnose DNS resolution issues, complete the following steps to use the tcpdump tool to perform a packet capture:
-
To locate a worker node where a CoreDNS pod is running, run the following command:
kubectl get pod -n kube-system -l k8s-app=kube-dns -o wide -
To use SSH to connect to the worker node and install the tcpdump tool, run the following command:
sudo yum install tcpdump - y -
To locate the CoreDNS Pod process ID on the worker node, run the following command:
ps ax | grep coredns -
From the worker node, run the following command to perform a packet capture on CoreDNS Pod network traffic on UDP port 53:
sudo nsenter -n -t PID tcpdump udp port 53 -
From a separate terminal, run the following command to get the CoreDNS service and Pod IP address:
kubectl describe svc kube-dns -n kube-systemNote: Note the service IP address in the IP field and the pod IP address in the Endpoints field.
-
Launch a pod to test the DNS service. The following example uses an Ubuntu container image:
kubectl run ubuntu --image=ubuntu sleep 1d kubectl exec -it ubuntu sh -
Run the following command to use the nslookup tool to perform a DNS query to the amazon.com domain:
nslookup amazon.comTo explicitly perform the same query against the CoreDNS service IP address, run the following command:
nslookup amazon.com COREDNS_SERVICE_IPNote: Replace COREDNS_SERVICE_IP with your CoreDNS service IP address.
To perform the query against each CoreDNS Pod IP address, run the following command:nslookup amazon.com COREDNS_POD_IPNote: Replace COREDNS_POD_IP with your CoreDNS Pod IP address. If you run multiple CoreDNS Pods, then perform multiple queries. This way, Amazon EKS sends at least one query to the Pod that you capture traffic from.
-
Review the packet capture results.
If the CoreDNS Pod experiences DNS query timeouts, and you don't see the query in the packet capture, then check your network connectivity. Check the network reachability between worker nodes.
If you see DNS query timeouts on a Pod IP address that you didn't capture, then perform another packet capture on the related worker node.
To save the results of a packet capture, add the -w FILE_NAME flag to the tcpdump command. The following example writes the results to the capture.pcap filef:tcpdump -w capture.pcap udp port 53
Related information
CoreDNS GA for Kubernetes cluster DNS on the Kubernetes website
- Topics
- Containers
- Language
- English
Related videos


If everything seems fine but if you are still not able to find a solution for DNS failure try deleting your code-dns pods it will restart again and it might solve the issue.
worked for me
The "Update the ndots value" section contains this incorrect statement:
Note: NXDOMAIN means that the Pod found the domain record. NOERROR means that the Pod didn't find the domain record.
NXDOMAIN means that the pod did NOT find the domain record. NOERROR means that the pod did SUCCESSFULLY resolve the domain record.
Relevant content
- Accepted Answerasked 3 years ago
AWS OFFICIALUpdated 10 months ago