Memory leaks when upgrading from Aurora MySQL 2.10.0 to 3.01.0

0

We recently upgraded a cluster from MySQL 5.7 (Aurora 2.10.0) to MySQL 8.0.23 (3.01.0) using a snapshot restore, as we could really benefit from the new features of MySQL 8. The moment the new cluster became available, memory usage on our writer instance started decreasing until it started consuming swap, and then once it ran out of swap, AWS initiated an automatic failover. This cycle has been repeating since we've performed the upgrade.

We have a separate test cluster we upgraded (using the same snapshot), and it does not appear to be experiencing the same issue. That cluster does not have much load on it, and also wouldn't be used for replication to other databases.

We tried increasing our instance sizes (db.t3.large to db.r5.large), turning off our cross region read replicas, turning off our DMS tasks, tweaking some memory settings, and opened a support ticket, but so far none of these avenues have helped us out at all.

Has anyone else run into this issue? Are there other troubleshooting steps we could try?

asked 2 years ago853 views
3 Answers
1
Accepted Answer
AWS
answered 2 years ago
profile picture
EXPERT
reviewed 11 hours ago
  • We just upgraded to Aurora 3.02.0 and turned our binlogs back on, and it appears that the memory usage is remaining stable. Thanks!

1

Hi,

I understand troubleshooting an issue of this nature can be frustrating, but we're here to help. Considering the issue is specific to an AWS Account/Aurora Cluster, we will need to review the cluster metrics and logs to understand where memory is being utilized. I see an investigation is on-going through the support case created on this topic. I have triggered an internal engagement with the Aurora development team to aid with investigating the behavior you're seeing with memory consumption.

Please refer to the case in the Support Center for the latest updates: https://console.aws.amazon.com/support/home#/

If you have any other questions or concerns, please engage us via your case as this is being actively monitored for the quickest turn around time. Thank you for your understanding and patience in working with us concerning this.

AWS
Shiv
answered 2 years ago
1

Same issues here. We raised a ticket with AWS who (eventually) responded confirming it's a bug with Aurora 3.0.1.0. A potential workaround is to disable binlog replication which isn't an option for us. A fix is pending "some time in Q1 2022", which is incredibly helpful when you're relying on this for production use.

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions