How do I troubleshoot DNS SERVFAIL issues?

7 minute read
0

I’m getting the “SERVFAIL” response when resolving my domain in Amazon Route 53.

Resolution

Issue: A third-party name server (NS) is blocking the AWS public resolver's IP address

If a third-party NS blocks the public resolver's IP address, then you see SERVFAIL responses when resolving queries against your public domain. This occurs whether you're resolving from one or multiple AWS Regions. However, resolving the same DNS query against some of the public DNS resolvers, such as 8.8.8.8 or 1.1.1.1, returns the NOERROR response.

To resolve this issue, contact your third-party DNS provider to create an allow list. Add to the list all AWS public resolver IP address ranges from the AWS Region where you observe SERVFAIL responses.

Issue: There's incorrect subdomain delegation configured in a public hosted zone

Example

Your public hosted zone "example.com" has subdomain delegation configured for "aws.example.com". The subdomain delegation configuration specifies unreachable or incorrect name servers that aren't authoritative for the subdomain.

Parent public hosted zone for domain "example.com"

example.comNSns1.example.com, ns2.example.com, ns3.example.com, ns4.example.com
aws.example.comNSdummy-ns1.com, dummy-ns2.net, dummy-ns3.co.uk, dummy-ns4.org,

Subdomain hosted zone for domain "aws.example.com"

aws.example.comNSns1-xxx.awsdns-xx.com, ns2-xxx.awsdns-xx.co.uk, ns3-xxx.awsdns-xx.net, ns4-xxx.awsdns-xx.org
aws.example.comA1.2.3.4

To resolve the preceding error, configure the name server records within the parent hosted zone to match the name servers in the subdomain hosted zone. If you're using custom name servers, then confirm that the name servers are reachable.

Issue: There are incorrect name servers listed with the domain registrar

When incorrect name servers are listed with the domain registrar, there are two causes for receiving SERVFAIL responses:

  • The name servers that are configured at the domain registrar don't match the name servers that are provided in your public hosted zone.
  • The name servers that are configured at the registrar exist but aren't authoritative for the given domain.

If the name servers don't exist, then the resolvers time out after initiating iterative queries. These timeouts cause significant latency in query time. Because the name servers can't provide an answer, the resolver returns the SERVFAIL response.

Domain’s public hosted zone

example.comNSns1.example.com, ns2.example.com, ns3.example.com, ns4.example.com

The name servers are configured at the domain registrar, as shown in the following example:

whois example.com | grep "Name Server"
Name Server: ns1.test.com  
Name Server: ns2.test.com
Name Server: ns3.test.com  
Name Server: ns4.test.com

To resolve this error, take one of the following actions:

  • White label name server isn't implemented: Replace the registrar's name server with the name servers that are assigned to your public hosted zone.
  • White label name server is implemented: Make sure that the registrar's name servers are identical to your glue records and the A records for white-label name servers in the public hosted zone.

Issue: There's unsupported subdomain delegation that's configured in the private hosted zone

If subdomain delegation is configured incorrectly in the private hosted zone, then the virtual private cloud (VPC) DNS resolver returns SERVFAIL.

Private hosted zone

servfail.localNSns-xxx.awsdns-xx.co.uk, ns-x.awsdns-xx.com, ns-xxx.awsdns-xx.org, ns-xxx.awsdns-xx.net.
sub.servfail.localNSns-xxx.awsdns-xx.net.

Note: You can't use the AWS Management Console to create NS records in a private hosted zone to delegate responsibility for a subdomain. Instead, use the AWS Command Line Interface (AWS CLI). Note that Amazon Route 53 doesn't support subdomain delegation in private hosted zone.

Issue: DNSSEC is misconfigured

DNSSEC might consist of one or more of the following misconfigurations:

  • DNSSEC is turned on at the domain registrar level but not at the DNS hosting service end.
  • DNSSEC signing is turned on at the domain registrar level and at the DNS hosting service end. However, one or multiple essential pieces of information (such as key type, signing algorithm, and public key) are mismatched. Or, the DS record is incorrect.
  • The chain of trust between the parent zone and child zone isn't established. The DS record in parent zone doesn't match the hash of the public KSK in the child zone.

To resolve this issue, see How can I identify and troubleshoot DNSSEC configuration issues in Route 53?

Issue: Route 53 Resolver inbound and outbound endpoint chaining is misconfigured

DNS traffic that's in a loop causes this issue. Traffic flow from the following pattern causes the loop:

EC2 instance - VPC DNS resolver - (match forwarding rule) - outbound endpoint - (target IP address of inbound endpoint) - Inbound endpoint - VPC DNS resolver

To resolve this issue, see Avoid loop configurations with Resolver endpoints.

Issue: There are connectivity issues on Route 53 Resolver outbound endpoints

If there are connectivity issues between Route 53 resolver outbound endpoints and the Resolver rule target IP addresses, then AmazonProvidedDNS returns SERVFAIL.

To resolve this issue, complete the following steps:

  • Verify network connectivity from the outbound endpoint-created elastic network interface VPC to the target IP addresses:
    1. Check network access control lists (Network ACLs).
    2. Make sure that there is an outbound endpoint security group egress rule that allows TCP and UDP traffic over Port 53 to the target IP addresses.
    3. Check any firewall rules that are configured on the target IP address end.
    4. Verify routing between the outbound endpoint elastic network interface and the target IP addresses.
  • By design, Route 53 Resolver outbound endpoint elastic network interfaces don't have public IP addresses. If the target DNS server is a public DNS (for example: 8.8.8.8), then verify that the outbound endpoint is created in a private subnet with a route table entry for a NAT gateway.

Issue: There's s a missing glue record In the parent zone

Example

Within the public hosted zone for domain "example.com", there's a subdomain delegation for "glue.example.com" that points to the subdomain's name servers. But, the glue record doesn't exist in the public hosted zone "example.com", as shown in the following example:

Parent public hosted zone for domain "example.com"

example.comNSns1.example.com, ns2.example.com, ns3.example.com, ns4.example.com
glue.example.comNSns1.glue.example.com, ns2.glue.example.com, ns3.glue.example.com, ns4.glue.example.com

Subdomain public hosted zone for domain "glue.example.com"

glue.example.comNSns1.glue.example.com, ns2.glue.example.com, ns3.glue.example.com, ns4.glue.example.com
glue.example.comA1.2.3.4
ns1.glue.example.comA3.3.3.3
ns2.glue.example.comA4.4.4.4
ns3.glue.example.comA5.5.5.5
ns4.glue.example.comA6.6.6.6

To resolve this issue, create the glue records for subdomain "glue.example.com." in the parent domain hosted zone.

Parent public hosted zone for domain "example.com"

example.comNSns1.example.com, ns2.example.com, ns3.example.com, ns4.example.com
glue.example.comNSns1.glue.example.com, ns2.glue.example.com, ns3.glue.example.com, ns4.glue.example.com
glue.example.comA1.2.3.4
ns1.glue.example.comA3.3.3.3
ns2.glue.example.comA4.4.4.4
ns3.glue.example.comA5.5.5.5
ns4.glue.example.comA6.6.6.6

Issue: The max-recursion-depth is exceeded

If the querying domain responds with a depth more than nine, then the max-recursion-depth is exceeded. The response must be a chain of no more than eight CNAME records and a final A/AAAA record.

To resolve this issue, reduce the number of CNAME records in the response. To prevent loops, Route 53 Resolver supports a maximum depth of nine (chain of eight CNAMEs and an A/AAAA record).

AWS OFFICIAL
AWS OFFICIALUpdated a year ago