We are experiencing recurring Aurora PostgreSQL crashes caused by buffer cache mapping errors.
- The issue first occurred about 6 months ago.
- On August 7, 2025, we observed the error again (1 occurrence).
- On August 14, 2025, the same error occurred twice, each time causing the database to restart.
- These crashes happened during peak traffic hours, causing application downtime.
2025-08-14 16:43:05 UTC:[local]:rdsadmin@rdsadmin:[631]:PANIC: Failed to update mapping of buffer cache segments in aurora storage, err: 12.
2025-08-14 16:43:05 UTC:[local]:rdsadmin@rdsadmin:[631]:STATEMENT: Select * from aurora_scale_buffer_cache($1)
2025-08-14 16:43:06 UTC::@:[584]:LOG: server process (PID 631) was terminated by signal 6: Aborted
2025-08-14 16:43:06 UTC::@:[584]:DETAIL: Failed process was running: Select * from aurora_scale_buffer_cache($1)
2025-08-14 16:43:06 UTC::@:[584]:LOG: terminating any other active server processes
2025-08-14 16:43:06 UTC::@:[584]:FATAL: Aurora Runtime process unexpectedly exited
2025-08-14 16:43:06 UTC::@:[584]:LOG: database system is shut down
We checked system metrics (CPU, memory, storage) around the time of the incidents - all were within normal ranges. Each error led to an Aurora runtime crash and forced database restart
Please help investigate the root cause of these buffer cache errors. Are there known issues or bugs with aurora_scale_buffer_cache or the buffer cache mapping process? What mitigation steps or configuration changes can we apply to prevent future crashes?