I want to determine what caused my Amazon Aurora PostgreSQL-Compatible Edition database (DB) instances to unexpectedly restart or fail over, and prevent future occurrences.
Short description
Unexpected restarts in Aurora PostgreSQL-Compatible DB instances can occur because of hardware failures, high resource utilization, replication lag, or software issues.
Resolution
Check Amazon RDS events for your DB instance
Complete the following steps:
- Open the Amazon Relational Database Service (Amazon RDS) console.
- In the navigation pane, choose Events.
- Look for events that occurred around the time that your DB instance restarted.
For more information, see Viewing Amazon RDS events and Working with Amazon RDS event notification.
Analyze DB instance metrics
Complete the following steps:
- Open the Amazon RDS console.
- In the navigation pane, choose Databases.
- Select your DB instance.
- Choose the Monitoring tab.
- Review the following metrics:
For CPUUtilization, check for sustained high CPU usage.
For DatabaseConnections, verify that connection counts aren't exceeding the quota.
For FreeableMemory, confirm that sufficient memory is available.
For ReadIOPS and WriteIOPS, look for unusual I/O patterns.
(Only reader instances) For AuroraReplicaLag, check replication lag values.
- Look for anomalies or spikes that might have initiated the restart.
For more information, see Monitoring Amazon Aurora metrics with Amazon CloudWatch.
Review CloudWatch Database Insights
Complete the following steps:
- Open the Amazon RDS console.
- In the navigation pane, choose Database Insights.
- In the Database Insights pane, select your DB instance from the list.
- Analyze the top SQL queries and wait events around the time of the restart.
For more information, see Monitoring Amazon Aurora databases with CloudWatch Database Insights.
Check for hardware issues
If you suspect a hardware failure, then contact AWS Support to troubleshoot the issue. AWS Support can check whether a host-level issue initiated the restart.
Review DB logs
Complete the following steps:
- Open the Amazon RDS console.
- In the navigation pane, choose Databases.
- Select your DB instance.
- Choose the Logs & events tab.
- In the Logs section, download and review the PostgreSQL log files for errors or warnings around the time of the restart.
For more information about Aurora PostgreSQL database log files, see Aurora PostgreSQL database log files.
Check for pending maintenance
Complete the following steps:
- Open the Amazon RDS console.
- In the navigation pane, choose Databases.
- Select your DB instance.
- Choose the Maintenance & backups tab.
- If there's pending maintenance, then choose Apply now or Apply at next maintenance window.
Create an alarm
Set up CloudWatch alarms for critical metrics, such as CPU utilization, memory usage, and replication lag.
Optimize resource usage
Take the following actions:
Scale your resources
If you consistently see high resource utilization, then scale up your instance type or add Aurora Replicas.
Keep your database updated
To resolve bugs and improve performance, regularly apply patches and version upgrades.
Implement high availability
For a single instance, use Multi-AZ DB instance deployments. For Aurora clusters, make sure that you have at least one reader instance that Aurora can promote in case of writer instance issues.
For more information, see High availability for Amazon Aurora.
Related information
Monitoring tools for Amazon Aurora
How do I troubleshoot issues that cause my Aurora read replica to lag and restart?
Fast failover with Amazon Aurora PostgreSQL