Route 53 Active-Passive Failover Goes Back And Forth Between Primary and Secondary Due To Unstable Primary, i.e. Does Not Settle

0

Hi community!

I'm seeking suggestions to improve my Route 53 active-passive failover setup. On a couple of occasions where my primary web server was unstable, my Route 53 active-passive failover went back and forth between the primary and secondary A records and did not settle on the seconday. To make it settle on the seconday, I had to stop by hand the web server process on the primary server.

At present my heathcheck for the failover is:

Thank you in advance! Tatsuo

Tatsuo
질문됨 2년 전552회 조회
2개 답변
0
수락된 답변

For Route53 failover routing to work properly, it is absolutely critical that your health checks work properly.

There are three different types of Active-Passive failover that can be set-up as mentioned in this doc - https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dns-failover-types.html

  1. active-passive failover with one primary and one secondary resource
  2. active-passive failover with multiple primary and secondary resources
  3. active-passive failover with weighted records

In your case, which one have you configured?

Note: Please accept my answer if you like it. Thanks

profile pictureAWS
전문가
답변함 2년 전
0

thanks Indranil. In my case it is 1. active-passive failover with one primary and one secondary resource

Tatsuo
답변함 2년 전
  • Did you notice problems under heavy load? If under load, the health check URL takes time to respond, because the machine is heavily loaded, it could lead to a failover. You should run performance tests and use vertical scaling or even better horizontal scaling (put an ALB in front of your EC2 machines) in each region, to better scale.

    Another change to look at is to use containers and use either ECS or EKS for container orchestration. That would involve significant re-architecture though but in the end may be worth the effort.

  • you are right Indranil:

    1. my primary web server experiences a heavy load and becomes slow to respond
    2. the healthcheck against my primary web server fails
    3. Route 53 fails over the name server A record to my secondary web server
    4. my primary web server no longer experiences a heavy load because it's not dealing with a lot of traffic
    5. the healthcheck against my primary web server succeeds
    6. Route 53 brings my primary web server back to service
    7. now it is back in service, it again experiences a heavy load
    8. the healthcheck against my primary web server fails
    9. the process continues!
  • Indranil, I will explore your scaling suggestions!

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인