How does DNS work, and how do I troubleshoot partial or intermittent DNS failures?

7 minute read
0

I want to troubleshoot partial or intermittent DNS failures.

Resolution

DNS overview

DNS translates easy to remember names such as www.example.com into numeric IP addresses such as 192.0.2.1, and then routes the user to the internet applications. This process is called "DNS resolution." For more information, see What is DNS?

Partial, temporary, or intermittent DNS failure scenarios

In some cases, a client experiences DNS failures either for a short period of time or intermittently. The following are common scenarios that might cause partial DNS failure:

Misconfigured name servers at the registrar

Sometimes one or more name servers are misconfigured on the registrar. A whois lookup provides the name servers that are configured on the registrar of the domain. During the DNS resolution, if registered name servers don't respond or they respond with unexpected information, then the local resolver returns a SERVFAIL message. In some cases, local resolvers can try the request with a different name server that might not be misconfigured. When they do this, they receive the expected DNS response.

Also, the local resolvers can cache the wrong name servers for the TTL time and can send the next query to the misconfigured name server.

Altered name servers at the hosted zone

When the NS record for a domain is misconfigured in the hosted zone, a partial DNS failure can occur. Either the existing name servers were updated or additional name servers were added to the value of the NS record. If the resolver tries to resolve the domain using the wrong name server, then you can experience a partial DNS failure.

Client's DNS resolver can't resolve the domain

Sometimes incorrect resolvers are set in the resolver configuration file, such as resolv.conf in Linux. When you resolve the domain from an Amazon Elastic Compute Cloud (Amazon EC2) instance in an Amazon Virtual Private Cloud (Amazon VPC), the EC2 instance uses the name servers defined in resolv.conf.

Amazon provided DNS server throttling the DNS queries

Amazon provided DNS servers enforce a limit of 1024 packets per second per elastic network interface. Amazon provided DNS servers reject any traffic that exceeds this limit. Because of the DNS throttling, the DNS timeouts intermittently. To resolve this issue, turn on caching at the instance or increase the DNS retry timer on the application.

The domain URL resolves from the internet, but not from the EC2 instance

After you complete the following, DNS queries for your domain always resolve from the private hosted zone:

If the queried record for your domain isn't present in the private hosted zone, then the DNS query fails. Also, your DNS query isn't forwarded to the public domain. Because the DNS record is present in the public domain zone, it does resolve from the internet.

Misconfigured DNS firewall rule in Route 53

If any of the following are true for any domains, then check whether Amazon Route 53 DNS firewall is configured for your domain.

  • Resolves on the internet
  • Resolves through a public resolver (that is, 1.1.1.1 or 8.8.8.8 as the resolver IP)
  • Doesn't resolve from a virtual private server (VPS).

Misconfigured Route 53 resolver endpoints

Route 53 Resolver outbound endpoints and resolver rule can be configured to send a specific DNS query to an on-premises DNS server. Make sure that Route 53 endpoints, the resolver rule, and the on-premises DNS server are configured correctly. For more information, see How do I troubleshoot DNS resolution issues with Route 53 Resolver endpoints?

Troubleshoot DNS failures on Linux-based operating systems

Use the dig command to perform a lookup against the client DNS server that's configured in the host's /etc/resolv.conf file.

$ dig www.amazon.com    
; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.37.rc1.49.amzn1 <<>> www.amazon.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13150
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.amazon.com.    IN    A

;; ANSWER SECTION:
www.amazon.com.        41    IN    A    54.239.17.6

;; Query time: 1 msec
;; SERVER: 10.108.0.2#53(10.108.0.2)
;; WHEN: Fri Oct 21 21:43:11 2016
;; MSG SIZE rcvd: 48

In the preceding example, the answer section shows that 54.239.17.6 is the IP address of the HTTP server for www.amazon.com. If you add the +trace variable, then the dig command can also perform a recursive lookup of a DNS record. The following is an example of the dig command with the +trace variable:

$ dig +trace www.amazon.com    
; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.37.rc1.49.amzn1 <<>> +trace www.amazon.com
;; global options: +cmd
.        518400    IN    NS    J.ROOT-SERVERS.NET.
.        518400    IN    NS    K.ROOT-SERVERS.NET.
.        518400    IN    NS    L.ROOT-SERVERS.NET.
...
;; Received 508 bytes from 10.108.0.2#53(10.108.0.2) in 31 ms

com.        172800    IN    NS    a.gtld-servers.net.
com.        172800    IN    NS    b.gtld-servers.net.
com.        172800    IN    NS    c.gtld-servers.net.
...
;; Received 492 bytes from 193.0.14.129#53(193.0.14.129) in 93 ms
amazon.com.        172800    IN    NS    pdns1.ultradns.net.
amazon.com.        172800    IN    NS    pdns6.ultradns.co.uk.
...
;; Received 289 bytes from 192.33.14.30#53(192.33.14.30) in 201 ms
www.amazon.com.    900    IN    NS    ns-1019.awsdns-63.net.
www.amazon.com.    900    IN    NS    ns-1568.awsdns-04.co.uk.
www.amazon.com.    900    IN    NS    ns-277.awsdns-34.com.
...
;; Received 170 bytes from 204.74.108.1#53(204.74.108.1) in 87 ms

www.amazon.com.    60     IN    A    54.239.26.128
www.amazon.com.    1800   IN    NS   ns-1019.awsdns-63.net.
www.amazon.com.    1800   IN    NS   ns-1178.awsdns-19.org.
...
;; Received 186 bytes from 205.251.195.251#53(205.251.195.251) in 7 ms

You can also perform a query that returns only the name servers.

$ dig -t NS www.amazon.com
; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.37.rc1.49.amzn1 <<>> -t NS www.amazon.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48631
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.amazon.com.        IN    NS

;; ANSWER SECTION:
www.amazon.com.        490    IN    NS    ns-1019.awsdns-63.net.
www.amazon.com.        490    IN    NS    ns-1178.awsdns-19.org.
www.amazon.com.        490    IN    NS    ns-1568.awsdns-04.co.uk.
www.amazon.com.        490    IN    NS    ns-277.awsdns-34.com.

;; Query time: 0 msec
;; SERVER: 10.108.0.2#53(10.108.0.2)
;; WHEN: Fri Oct 21 21:48:20 2016
;; MSG SIZE rcvd: 170

In the preceding example, www.amazon.com has the following four authoritative name servers:

  • ns-1019.awsdns-63.net.
  • ns-1178.awsdns-19.org.
  • ns-1568.awsdns-04.co.uk.
  • ns-277.awsdns-34.com.

Any of these four servers can authoritatively answer questions about the www.amazon.com hostname. Use the dig command to directly target a specific name server. Check whether every authoritative name server for a given domain answers correctly.

The following is example output for a query to www.amazon.com to one of its authoritative name servers (ns-1019.awsdns-63.net). The server response shows that www.amazon.com is available on 54.239.25.192:

$ dig www.amazon.com @ns-1019.awsdns-63.net.; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.37.rc1.49.amzn1 <<>> www.amazon.com @ns-1019.awsdns-63.net.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31712
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;www.amazon.com.    IN    A

;; ANSWER SECTION:
www.amazon.com.        60    IN    A    54.239.25.192

;; AUTHORITY SECTION:
www.amazon.com.        1800    IN    NS    ns-1019.awsdns-63.net.
www.amazon.com.        1800    IN    NS    ns-1178.awsdns-19.org.
www.amazon.com.        1800    IN    NS    ns-1568.awsdns-04.co.uk.
...

;; Query time: 7 msec
;; SERVER: 205.251.195.251#53(205.251.195.251)
;; WHEN: Fri Oct 21 21:50:00 2016
;; MSG SIZE rcvd: 186

The following line shows that ns-576.awsdns-08.net is an authoritative name server for amazon.com:

;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 0

The presence of the aa flag shows that the name server ns-1019.awsdns-63.net gave us an authoritative answer for the resource record www.amazon.com.

Troubleshoot DNS failures on Windows-based operating systems

Use the nslookup utility to return the IP address that's associated with a hostname.

C:\>nslookup www.amazon.comServer:     ip-10-20-0-2.ec2.internal
Address:    10.20.0.2

Non-authoritative answer:
Name:       www.amazon.com
Address:    54.239.25.192

To determine the authoritative name servers for a hostname using the nslookup utility, use the -type=NS flag:

C:\>nslookup -type=NS www.amazon.comServer:     ip-10-20-0-2.ec2.internal
Address:    10.20.0.2

Non-authoritative answer:
www.amazon.com    nameserver = ns-277.awsdns-34.com
www.amazon.com    nameserver = ns-1019.awsdns-63.net
www.amazon.com    nameserver = ns-1178.awsdns-19.org
...

To see if ns-277.awsdns-34.com for www.amazon.com responds correctly to a request for www.amazon.com, use the following syntax:

C:\>nslookup www.amazon.com ns-277.awsdns-34.comServer:     UnKnown
Address:    205.251.193.21

Name:       www.amazon.com
Address:    54.239.25.200
AWS OFFICIALUpdated 10 months ago