RDS Failover Scenario

0

I have an Oracle RDS multi-AZ instance running in AWS in AZ1.

The standby is running in AZ2.

If there was a hardware/software or complete AZ failure in AZ1, AWS switches primary to standby in AZ2 which becomes the new primary server.

Does AWS create a new standby in AZ1 immediately after it switches to standby in AZ2 and starts replication from server in AZ2 to AZ1?

What does AWS do to new standby in case of AZ1 has completely failed?

Thanks,

1 Answer
0

I think this answer here will also answer your question https://repost.aws/questions/QU4DYhqh2yQGGmjE_x0ylBYg/what-happens-after-failover-in-rds

the failed primary instance is diagnosed by the RDS internal health monitoring system and remediation actions are taken based on the detected fault. The remediation action may involve simply rebooting the faulty instance to even replacement of the hardware depending on the detected fault. Once the old primary node is recovered, it is brough back up as the new Standby instance ensuring your DBs high-availability.

profile picture
EXPERT
Steve_M
answered 13 days ago
profile pictureAWS
EXPERT
reviewed 13 days ago
  • It does not cover my scenario for case of complete failure of AZ1. Also, not sure in case of hardware failure how long it take AWS to startup a new standby instance because with multi-AZ replication both must be running simultaneously and the two DBs are updated same time.

  • Multi-AZ means that the workload can tolerate the loss of an AZ, which in the scenario here has happened - AZ1 is lost and the database is now running in AZ2.

    When AZ1 comes back online (e.g. power was restored after an outage, or a network connectivity issue was fixed) then the standby will spin in AZ1 while AZ2 continues to be the primary.

    I know what you're asking is, during that period where AZ1 is offline and AZ2 is flying solo, will AWS spin up a new replica in some AZ3 somewhere, and the answer is no. If you need to build in resilience to the loss of two AZs then consider a third replica, or (depending on the database engine) multi-region RDS.

  • According to your last comment, if there was an AZ1 failure for a few days, the primary RDS instance in AZ2 will not have a standby DB running in another AZ. Correct? Regarding your comment about having two standbys (third replica), I dont think Multi-AZ supports that. I think there is a new option for Multi-AZ cluster with two read only standby but I am not sure if AWS will switch to second machine if two nodes fail. Remember replication is handled automatically by AWS with multi-AZ and it is one standby only.

  • Yes, my understanding (disclaimer that I don't work for AWS, I'm just a customer) is that the Multi-AZ offering is defined as the workload continuing to run if an AZ becomes unavailable. In the scenario here (AZ1 is lost and it continues to run in AZ2) it would seem that AWS has kept its end of the bargain, while they work on returning AZ1 to service so it can again host a standby instance.

    I don't think the customer if paying for (again, I don't work for AWS, and I'm not a contract lawyer) a guarantee that a standby instance will always exist. If the workload is so critical that guarding against one AZ failure event isn't enough, and that a second AZ failure later has to be considered, then Multi-AZ clusters exist in three AZs

    https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/multi-az-db-clusters-concepts.html A Multi-AZ DB cluster has a writer DB instance and two reader DB instances in three separate Availability Zones in the same AWS Region.

    https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.MultiAZ.html A Multi-AZ DB cluster deployment has standby DB instances that provide failover support

    According to the second link here, both standbys can be failed over to. What I take from that is if AZ1 is lost, AZ2 becomes the new primary, and AZ3 is still available for failover.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions