We recently moved from Provisioned Aurora 2.10.1 (MySQL 5.7) to Aurora 3.01.0 (MySQL 8.0). We have Enhanced Monitoring and Performance Insights enabled.
We've been monitoring the database metrics and noticed that the Freeable Memory consistently drops so low that Aurora executes a Failover. There are no deadlocks or sleeping or zombie processes running when this occurs. In Performance Insights, os.memory.free.avg decreases over time while os.memory.active.avg increases at the same rate (of course) and then we also notice that os.memory.pageTables.avg increases at the same rate as os.memory.active.avg . The database size is smaller than the total amount of RAM that the instance can support, so the instance size should be big enough to handle all of it. We've even reduced the innodb_buffer_pool_size to a size sufficiently smaller than the DB Memory but large enough to cache the data, but it doesn't help. The Buffer Cache Hit Ratio hovers around 100%. The Swap space remains at zero. The reader Freeable memory remains consistent but the Writer's Freeable memory is what drops. When Aurora promotes a Reader to a Writer, the same thing happens.
When I launch a new instance from a snapshot, I don't see the same issue occur.
If MySQL is limited by parameters in the parameter group, how can we determine if the OS is using more Memory than it's supposed to and if so, how do we stop the OS from using more memory than it's supposed to?