RDS Aurora Mysql Multi-master downtime and best practices

0

Hi all,

---Context--- A customer is currently using a large RDS Aurora cluster (12 replicas, 6x r5.12xlarge and 6x r5.4xlarge ) for their production environment. This cluster is currently part of a monolith that is proactively (and slowly) being broken down into smaller applications with independent data stores. This will still take months/years to complete due to competing priorities on their end

---Challenge--- Over the past few months the customer has performed a few database restarts due to either engine upgrades or different database parameter tuning. The customer would like to evaluate multi-master or any other alternative that mitigates service downtime as much as possible for future upgrades or restarts.

---Questions---

  1. Is multi-master (2 nodes) + 12 additional read replicas an option at all?
  2. If we ever implement a multi-master approach keeping the remaining replicas as readers, how does a database upgrade/restart affect the service? are all the database notes rebooted as well as it happens with a regular single-master cluster?
  3. The customer application is not built for a multi-master active-active approach as they won't be able to handle deadlocks at the application level. Is a multi-master active-passive an option for fail-over?
  4. Do we have any other recommendation/architecture for managing database upgrade/restarts that would help minimizing downtime?

Thanks!

AWS
asked 4 years ago2667 views
1 Answer
0
Accepted Answer

1), 2)

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-multi-master.html#aurora-multi-master-overview

In a multi-master cluster, all DB instances can perform write operations. The notions of a single read/write primary instance and multiple read-only Aurora Replicas don't apply.

The multi-master does not have read replicas and two nodes R/W.

Because binlog replication is not available, EC2 can also be used for replication Not possible. As a result, it is currently difficult to scale out read workloads on multi-master.

Yes, active-passive workloads minimize any downtime for write operations. However, if one of the nodes dies, the mechanism for accessing the other node is the application's It is a responsibility. Cluster endpoints are not used for DML in Multi-master.

Check out those other limitatons.

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-multi-master.html#aurora-multi-master-limitations

What seconds of downtime is acceptable? You may want to review DB connection management first. There are best practices for DNS caching, Smart drivers, etc.

https://d1.awsstatic.com/whitepapers/RDS/amazon-aurora-connection-management-handbook.pdf

answered 4 years ago
profile picture
EXPERT
reviewed 5 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions