RDS Failover Testing

0

Hi

I need to plan, design and test failover for some RDS oracle production servers running in multi-az environment.

Is there any documents or videos that discuss details for doing that? Would you use the actual RDS production servers for testing and do you have to get application owner involved or not?

Thanks,

2 Answers
1
Accepted Answer

Hi,

Since you are talking about a failover in a multi-AZ configuration, I take it that you are interested in testing the failing over between the AZs. You will find this knowledge article[1] helpful. The article talks about finding out the reasons, logs and events after a failover, but I guess that would help you document your tests when you decide to carry it out. To initiate a failover, you may already know that, you can do that manually when you reboot a DB instance. For more information, see Rebooting a DB instance [2].

Would you use the actual RDS production servers?

That would depend on what mandate you are working on within your organization. If it is only functional proof you are after, doing it in a non-production environment would be a good idea.

Should you get application owners get involved?

Again, this depends on the scope of your test and the intention. To test any such failover activity end to end, involving application owners is definitely recommended. On the other hand, if your scope is only infrastructure testing, you can do it turning off the application activity/access.

[1] https://repost.aws/knowledge-center/rds-multi-az-failover-restart

[2] https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_RebootInstance.html

Hope this helps.

profile pictureAWS
EXPERT
answered 2 months ago
  • Yes, I want to test the actual RDS failover in the different AZ since multi-az always creates a standby in different AZ.

    Yes, I also need to do end-to-end application testing, to ensure the application continue working seamlessly in case of a disaster or failure. I assume there are not DNS changes or application changes need to be done and AWS will update the DNS automatically with new endpoint for failover instance.

    Based on documentation you provided it seems the RDS failover takes about 60 seconds. Is there any possibility of data loss with RDS failover? If yes, how much is the max data loss that can happen (minutes, hours, days)?

    I do not think there is a way for me to access the failover DB before the switch occurs.

  • DNS Change: Yes, the DNS changes are automatic in case of multi-AZ RDS failover (see https://aws.amazon.com/blogs/database/amazon-rds-under-the-hood-multi-az/).

    Data Loss: A failover in and within a Multi-AZ RDS environment would entertain no data loss (see: https://aws.amazon.com/blogs/database/choose-the-right-amazon-rds-deployment-option-single-az-instance-multi-az-instance-or-multi-az-database-cluster).

    Access to Standby: No, you can't access the standby before it completes its role change as the primary.

  • Great answers and articles!

    Shall I assume your answers cover the case where primary RDS server in AZ1 fails over to RDS server in AZ2 without switching over the application/web EC2 server? My thinking is that we there can be two different failover situations: First one for RDS failure only where we continue to use APP/WEB server 1 in AZ1 with RDS Server 2 in AZ2, and second one where Availability zone AZ1 fails and we have to switch to another APP/WEB server 2 and RDS Server 2 in a different Availability zone.

    Does not the RDS server Endpoint and IP address change every time there is a failover? If Yes, I have to update the application server oracle client config file that has a TNS string with RDS endpoint for connection to DB server.

  • That's correct. This discussion is only about RDS failover in a multi-AZ configuration, and nothing related to any application/web server switchover. You are also correct in saying that you need to look at app/web server failover scenarios too while considering AZ failure.

    The database instance end-point and IP address change during the failover. And, yes, you should configure client connections to point to the cluster endpoint rather than instance end point in a multi-AZ environment. See: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/multi-az-db-clusters-concepts-connection-management.html

  • Hi I don't have a multi-az cluster configuration. I only have a single failover server. I have seen some article on stackoverflow that mentions the endpoint stays the same after failover. Are you sure there will be a new endpoint? Do you normally continue production using the new failover server or switch the users back to the old primary server?

1

Hello,

To answer your latest query related to endpoint, please check below scenario and confirm if that helps.

Multi-AZ setup:

Primary - AZ1 (DB is running here) Secondary - AZ2 (DB is not running here)

Example

endpoint - test.<account>.<region>.rds.amazonaws.com ----> This always points/or resolves to ip address of your current primary instance AZ1

When you perform failover, DNS propogation is done by RDS automation and endpoint will now resolve to secondary instance AZ2 ip address.

So there is no change in endpoint, however endpoint will resolve to different ip. Hence, it is always recommended to use instance endpoint instead of ip address to avoid connectivity issues after failover.

Hope this clears the issue/query around multi-az failover testing.

AWS
answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions