Route 53 Active-Passive Failover Goes Back And Forth Between Primary and Secondary Due To Unstable Primary, i.e. Does Not Settle

0

Hi community!

I'm seeking suggestions to improve my Route 53 active-passive failover setup. On a couple of occasions where my primary web server was unstable, my Route 53 active-passive failover went back and forth between the primary and secondary A records and did not settle on the seconday. To make it settle on the seconday, I had to stop by hand the web server process on the primary server.

At present my heathcheck for the failover is:

Thank you in advance! Tatsuo

Tatsuo
gefragt vor 2 Jahren553 Aufrufe
2 Antworten
0
Akzeptierte Antwort

For Route53 failover routing to work properly, it is absolutely critical that your health checks work properly.

There are three different types of Active-Passive failover that can be set-up as mentioned in this doc - https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dns-failover-types.html

  1. active-passive failover with one primary and one secondary resource
  2. active-passive failover with multiple primary and secondary resources
  3. active-passive failover with weighted records

In your case, which one have you configured?

Note: Please accept my answer if you like it. Thanks

profile pictureAWS
EXPERTE
beantwortet vor 2 Jahren
0

thanks Indranil. In my case it is 1. active-passive failover with one primary and one secondary resource

Tatsuo
beantwortet vor 2 Jahren
  • Did you notice problems under heavy load? If under load, the health check URL takes time to respond, because the machine is heavily loaded, it could lead to a failover. You should run performance tests and use vertical scaling or even better horizontal scaling (put an ALB in front of your EC2 machines) in each region, to better scale.

    Another change to look at is to use containers and use either ECS or EKS for container orchestration. That would involve significant re-architecture though but in the end may be worth the effort.

  • you are right Indranil:

    1. my primary web server experiences a heavy load and becomes slow to respond
    2. the healthcheck against my primary web server fails
    3. Route 53 fails over the name server A record to my secondary web server
    4. my primary web server no longer experiences a heavy load because it's not dealing with a lot of traffic
    5. the healthcheck against my primary web server succeeds
    6. Route 53 brings my primary web server back to service
    7. now it is back in service, it again experiences a heavy load
    8. the healthcheck against my primary web server fails
    9. the process continues!
  • Indranil, I will explore your scaling suggestions!

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen