RDS Aurora Mysql Multi-master downtime and best practices

0

Hi all,

---Context--- A customer is currently using a large RDS Aurora cluster (12 replicas, 6x r5.12xlarge and 6x r5.4xlarge ) for their production environment. This cluster is currently part of a monolith that is proactively (and slowly) being broken down into smaller applications with independent data stores. This will still take months/years to complete due to competing priorities on their end

---Challenge--- Over the past few months the customer has performed a few database restarts due to either engine upgrades or different database parameter tuning. The customer would like to evaluate multi-master or any other alternative that mitigates service downtime as much as possible for future upgrades or restarts.

---Questions---

  1. Is multi-master (2 nodes) + 12 additional read replicas an option at all?
  2. If we ever implement a multi-master approach keeping the remaining replicas as readers, how does a database upgrade/restart affect the service? are all the database notes rebooted as well as it happens with a regular single-master cluster?
  3. The customer application is not built for a multi-master active-active approach as they won't be able to handle deadlocks at the application level. Is a multi-master active-passive an option for fail-over?
  4. Do we have any other recommendation/architecture for managing database upgrade/restarts that would help minimizing downtime?

Thanks!

1개 답변
0
수락된 답변

1), 2)

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-multi-master.html#aurora-multi-master-overview

In a multi-master cluster, all DB instances can perform write operations. The notions of a single read/write primary instance and multiple read-only Aurora Replicas don't apply.

The multi-master does not have read replicas and two nodes R/W.

Because binlog replication is not available, EC2 can also be used for replication Not possible. As a result, it is currently difficult to scale out read workloads on multi-master.

Yes, active-passive workloads minimize any downtime for write operations. However, if one of the nodes dies, the mechanism for accessing the other node is the application's It is a responsibility. Cluster endpoints are not used for DML in Multi-master.

Check out those other limitatons.

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-multi-master.html#aurora-multi-master-limitations

What seconds of downtime is acceptable? You may want to review DB connection management first. There are best practices for DNS caching, Smart drivers, etc.

https://d1.awsstatic.com/whitepapers/RDS/amazon-aurora-connection-management-handbook.pdf

답변함 4년 전
profile picture
전문가
검토됨 6달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인