Route 53 Active-Passive Failover Goes Back And Forth Between Primary and Secondary Due To Unstable Primary, i.e. Does Not Settle

0

Hi community!

I'm seeking suggestions to improve my Route 53 active-passive failover setup. On a couple of occasions where my primary web server was unstable, my Route 53 active-passive failover went back and forth between the primary and secondary A records and did not settle on the seconday. To make it settle on the seconday, I had to stop by hand the web server process on the primary server.

At present my heathcheck for the failover is:

Thank you in advance! Tatsuo

Tatsuo
asked 2 years ago530 views
2 Answers
0
Accepted Answer

For Route53 failover routing to work properly, it is absolutely critical that your health checks work properly.

There are three different types of Active-Passive failover that can be set-up as mentioned in this doc - https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dns-failover-types.html

  1. active-passive failover with one primary and one secondary resource
  2. active-passive failover with multiple primary and secondary resources
  3. active-passive failover with weighted records

In your case, which one have you configured?

Note: Please accept my answer if you like it. Thanks

profile pictureAWS
EXPERT
answered 2 years ago
0

thanks Indranil. In my case it is 1. active-passive failover with one primary and one secondary resource

Tatsuo
answered 2 years ago
  • Did you notice problems under heavy load? If under load, the health check URL takes time to respond, because the machine is heavily loaded, it could lead to a failover. You should run performance tests and use vertical scaling or even better horizontal scaling (put an ALB in front of your EC2 machines) in each region, to better scale.

    Another change to look at is to use containers and use either ECS or EKS for container orchestration. That would involve significant re-architecture though but in the end may be worth the effort.

  • you are right Indranil:

    1. my primary web server experiences a heavy load and becomes slow to respond
    2. the healthcheck against my primary web server fails
    3. Route 53 fails over the name server A record to my secondary web server
    4. my primary web server no longer experiences a heavy load because it's not dealing with a lot of traffic
    5. the healthcheck against my primary web server succeeds
    6. Route 53 brings my primary web server back to service
    7. now it is back in service, it again experiences a heavy load
    8. the healthcheck against my primary web server fails
    9. the process continues!
  • Indranil, I will explore your scaling suggestions!

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions