- Newest
- Most votes
- Most comments
For Direct Connect resiliency recommendations you should go by this link: https://aws.amazon.com/directconnect/resiliency-recommendation/
The one shown in the Whitepaper is just an example.
For the question # 2, what if the entire PoP in one region goes down: your option would be IPSec VPN over the internet as a backup (Keep in mind the limitations of IPSec VPN, 1.25 Gbps bandwidth per tunnel, no SLA etc). Some customers also have some kind of MPLS networks between their data centers which they use as a backup path as well, you can see such example in this blog (figure7):
In the first link, the difference between "high" and "maximum" resilience is that if a single AWS Direct Connect edge router or any of the technical components related to it (like transceivers, cables, switch/router ports on your side, etc.) were to fail, connectivity would be maintained via that same DX location regardless, because you'd still have the other connection to the same DX location independent from the failed link.
For practical operational purposes, you'd also see this difference prominently in that when maintenance would be done on any of the devices on a given path, such as installing routine firmware/software updates, in the "high" resilience model, you'd be left with only one active physical path for the duration of the maintenance. By comparison, in the "maximum" model, you'd still have three links fully functional across two different DX locations, and any one of those three links would suffice for providing the needed connectivity.
One simple way to think of the difference would be that "high" means n+1 redundancy for the physical link layer, and "maximum" represents n+3 redundancy. At the level of entire PoPs (such as for complete loss of power), the redundancy would be n+1 in both cases, though.
The second diagram with the transit gateways represents the "high" resilience model, as you correctly observed. Each on-premises data centre has single connections to two DX locations, so they'd be highly available but with only one physical link and set of devices to spare. If any individual component or link is lost, connectivity is maintained (because it's n+1-redundant end to end) but redundancy is lost. Any second failure on the remaining link would lead to effective loss of connectivity.
The second diagram is intended to illustrate the relationship between the regional and local nature of both AWS regions and the on-premises data centres, combined with the global nature of AWS's global backbone network. A Direct Connect gateway isn't a virtual device that would reside in any specific physical location, but instead, it represents a VRF (virtual network) on AWS's global backbone network. That's why even though each corporate data centre is only connected to DX locations in two regions (DC 1 to regions A+B, DC 2 to regions C+D), it's able to reach the transit gateways in all four regions, A-D, via the single Direct Connect gateway.
Note also that because of just that structure, you shouldn't try to make a Direct Connect gateway redundant by setting up a second one. AWS's software-defined network understands that connections to a DX gateway lead to the same network, and therefore, it makes every attempt to keep all the Direct Connect connections associated with it on physically separate routes as much as possible. No such coordination occurs between two separate DX gateways, because by separating them, you'd be telling AWS's network explicitly that they are separate networks and not interdependent.
If you want to implement the four-region (or more) connectivity in the second link with the maximum resilience approach for Direct Connects in the first link, you should connect "corporate DC 1," for example, with four physical links: two of them connected to two different AWS DX edge routers at one DX location, and the other two links connected to two different DX edge routers at another DX location.
With regard to site-to-site VPNs, I'd suggest being careful about making assumptions about them. Each AWS site-to-site VPN provides a maximum throughput of 1.25 Gbit/s for the entire VPN (in total across the two redundant tunnels). Also, as with any VPN, the IPsec and potential NAT-T encapsulations reduce the maximum packet size (MTU, maximum transfer unit), which may negatively affect or require configuration changes to some applications or infrastructure components.
Relevant content
- Accepted Answerasked 4 years ago
- asked 7 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago