- Newest
- Most votes
- Most comments
Thanks for the quick reply! I just completed creating a reader so our cluster now has more than one instance across more than one AZ. Our last change was to move up to the latest available version of 2.x.x (5.7.mysql_aurora.2.11.0). We are using the default parameter group (default.aurora-mysql5.7) with max_connections, I believe, unchanged from the default ((GREATEST({log(DBInstanceClassMemory/805306368)*45},{log(DBInstanceClassMemory/8187281408)*1000})).
I will certainly take a closer look at the metrics and logs, but as far as I know the overall workload hasn't changed and the only reason I didn't dig deeper into instance limits is because these recovery actions only happen in the middle of the night (us-west-2) and just barely impact our primary workloads as they ramp up in the morning. But, like I said, the initial recovery happens hours after our day is over and hours before it begins.
I will look closer at the documents you provided to see if they point to any issues on our end. And, just to confirm, re:Post is our only option for support without a set support agreement on our account, correct? As in, 'open a support case' isn't really an option for us in this instance, right? Thanks, again!
First, you are correct that we recommend Multi AZ deployments. So if one host fails, you can always fail over to the other instance. This also ensures there is no data loss nor downtime.
Now, you have been stable for months and then faced multiple recoveries recently. Without seeing the instance logs or getting such information it is difficult to determine the root cause. It is possible there was a hardware failure, however... was there any change at all to the workload or any change to any MySQL parameter (like max_connections) outside the upgrade to 5.7? Are you using the latest Aurora MySQL database engine version [1]?
Have you checked the instance resources as well (CPU, memory, disk) to ensure they are below the instance limits [2]? Here is an article you can go through to review memory usage [3]. It specifies RDS MySQL but applies to Aurora MySQL as well.
If possible for you, opening a support case is a good way to get the instance reviewed by our support engineers who will be able to clarify if the issue is due to hardware failure or something else.
Hope it helps.
[1] https://docs.aws.amazon.com/AmazonRDS/latest/AuroraMySQLReleaseNotes/AuroraMySQL.Updates.2110.html
[2] https://aws.amazon.com/premiumsupport/knowledge-center/view-cpu-memory-aurora/
[3] https://aws.amazon.com/premiumsupport/knowledge-center/low-freeable-memory-rds-mysql-mariadb/
Relevant content
- asked 3 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 8 months ago
Correct, you need at least a Developer plan to create a technical support case. Basic support plan includes 24x7 access to customer service, documentation, whitepapers, and AWS re:Post but not technical support. Please remember to accept the answer if that helped you so that we can flag this question as answered. hope that helps!