How resilient is Route 53 DNSSEC to us-east-1 outages?

0

The Route 53 control plane available to customers runs in us-east-1. During regional outages, it becomes impossible to make configuration changes, but existing configurations continue working (at least as far as I can remember). I don't know if there are offsite backups, but they are at least theoretically possible.

Enabling DNSSEC apparently adds an ongoing dependency on the Route 53 control plane and the us-east-1 KMS, because the DNSKEY record set has to be resigned by the KSK every 8 hours and distributed to the edge PoPs.

Does this make Route 53 zones less resilient?

DNSKEY record sets are obviously signed and distributed to PoPs ahead of time, but how much of a buffer is there? A few minutes? A day? A week? What if there was an outage that started at 23:59 UTC, or lasted longer than 8 hours?

Is there some kind of offsite key export and DR plan that would allow Route 53 DNSSEC operations to continue during an extended us-east-1 outage?

(I'm assuming that Route 53 is resilient to control plane outages when DNSSEC is not enabled, but for all I know that may not be the case. For example, during the Facebook backbone outage, their DNS sites automatically disabled themselves when they lost access to the control plane for too long. I don't know if Route 53 has a similar design.)

asked 2 years ago1108 views
1 Answer
2
Accepted Answer

Enabling DNSSEC apparently adds an ongoing dependency on the Route 53 control plane and the us-east-1 KMS, because the DNSKEY record set has to be resigned by the KSK every 8 hours and distributed to the edge PoPs.

DNSSEC does add an eventual dependency on the Route 53 control plane and KMS. We do need to re-sign the ZSKs (and also rotate the ZSKs) at regular intervals. The dependency is designed to be very slow to become an issue. In a way somewhat similar to how the root nameservers operate, Route 53 pre-signs new ZSKs ahead of time and replicates them out to the DNS data plane in order that we can keep rotating and signing ZSKs for weeks before this dependency becomes an issue.

Is there some kind of offsite key export and DR plan that would allow Route 53 DNSSEC operations to continue during an extended us-east-1 outage?

The keys are stored in KMS across multiple availability zones (data centers which are miles apart) for redundancy.

(I'm assuming that Route 53 is resilient to control plane outages when DNSSEC is not enabled, but for all I know that may not be the case. For example, during the Facebook backbone outage, their DNS sites automatically disabled themselves when they lost access to the control plane for too long. I don't know if Route 53 has a similar design.)

Route 53's data plane is explicitly designed to be "statically stable" in the face of, e.g. a control plane failure or partition event. You can read more about how AWS thinks about this here:

https://aws.amazon.com/builders-library/static-stability-using-availability-zones/

AWS
EXPERT
gavinmc
answered 2 years ago
  • Thank you for the answer! "Weeks" and "static stability" were the key things I wanted to know. :-) It's good that there's AZ redundancy, but that's not the same thing as regional redundancy. (I'm aware that designs have trade-offs, and that if Virginia was nuked, humanity would have a long list of problems.)

  • Hypothetically, Route 53 could use multiple KSKs stored in KMS in different regions.

  • I believe the root zone has HSMs in both Virginia and California. ;-)

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions