跳至内容

AWS DocumentDB automatic failover when cpu utilization reaches 100%

0

Due to some bad queries, our primary instance got overwhelmed reaching the CPU utilization to 100% resulting in queries getting timed out, we had to do manual failover to replica instance in this case.

Though we are wondering why doesn't it initiate automatic failover in this case or we need to some additional configuration to enable this.

If it doesn't support automatic failover in this scenario, I could see a solution like triggering a lambda to force failover when CPU utilization alarm breaches threshold. If someone has a more simpler solution, it would be helpful.

已提问 1 年前372 查看次数
1 回答
1
已接受的回答

Hello.

Though we are wondering why doesn't it initiate automatic failover in this case or we need to some additional configuration to enable this.

If you look at the document below, it states that there will be a failover if a database failure occurs.
In other words, even if the CPU usage rate is high, unless AWS judges it as a failure, failover will not occur.
https://docs.aws.amazon.com/documentdb/latest/developerguide/failover.html#:~:text=When%20the%20primary%20instance%20fails,has%20its%20own%20endpoint%20address.

When the primary instance fails, Amazon DocumentDB automatically fails over to an Amazon DocumentDB replica, if one exists.

If it doesn't support automatic failover in this scenario, I could see a solution like triggering a lambda to force failover when CPU utilization alarm breaches threshold. If someone has a more simpler solution, it would be helpful.

Even if you perform a manual failover, unless you identify the cause, the CPU usage will increase in the same way after the failover.
If you're okay with that, I think it's possible to automate failover by using CloudWatch alarms, Amazon SNS, and Lambda.

专家
已回答 1 年前
专家
已审核 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。