What is the typical and maximum time for traffic to a reallocated Elastic IP address to reach the new network interface?

0

For disaster recovery planning purposes:

What is the typical and maximum time for traffic to a reallocated Elastic IP address to reach the new network interface?

For example, I have 2 adapters in different AZ's in the same region. The Elastic IP address is allocated to the network interface in AZ-1. A monitor detects an issue in AZ-1 and reallocates the Elastic IP to the network adapter in AZ-2.

What is a typical timeframe after the reallocation that the network adapter in AZ-2 will start receiving traffic from the Elastic IP?

I couldn't find this documented anywhere. I could very easily just be missing it. If the documentation does exist for timings useful in RTO's for typical AWS DR scenarios with various services, that would definitely be a bonus answer to this question!

mjans71
asked a year ago216 views
1 Answer
1

We don't supply estimated times for many things and this is one of those. My best advice here is to test and to take the numbers from there.

In the event of a large-scale failure those numbers might not be realistic - particularly as there will be many other customers all trying to do the same thing. So definitely plan on the worst-case.

If it were me, I would try and avoid doing any changes in the event of a failure - because you don't know what has failed and therefore what will work and what won't. And it's next to impossible to test for that. Instead, build a system which is active in two (or three) AZs all the time. I appreciate that this is a non-trivial thing to ask and to design for but if you have an application that must be up all of the time then this is a good strategy.

profile pictureAWS
EXPERT
answered a year ago
  • Thanks for your quick reply!

    This exact pattern is recommended in ""Limitations and Extensibility" section of "Floating IP pattern for HA between active–standby stateful servers" page (https://docs.aws.amazon.com/whitepapers/latest/real-time-communication-on-aws/floating-ip-pattern-for-ha-between-activestandby-stateful-servers.html).

    To your point I have seen other AWS blog posts on DR warning against Control Plane changes during a failover. Is there a guide for DR that covers core AWS services and do's & dont's of AZ failure resilient DR?

    Thanks again for your response and your insight.

  • Reflecting on your 3rd paragraph more, are you saying that the only actual reliable AZ failure resilient DR strategy is active/active? Or perhaps that as we learn more and experience actual AZ outages, what we thought worked previously (or perhaps because everyone is implementing what worked and was recommended previously and it's causing more problems during an outage), that implementing anything less than active/active that actually works is trickier than was previously thought and might not be worth it unless you have an intimate understanding of those issues?

  • There's a famous quote from Werner Vogels (Amazon CTO): "Everything fails all the time". And to add to that: Complex systems (of which an AWS AZ could be counted one) fail in ways that are .... complex. So while some things may continue to work during a failure event, other things will not - and it's not possible to predict which will and which won't. And the next failure may not be like the last. There are many components that go into making up an AZ; and there could be any sort of failure there from large-scale to small-scale. Planning is good! Great even! But in my opinion the best thing that you can do in these events is to be already running in another AZ.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions