Skip to content

RDS instance randomly crashing 3 times per day

0

This is in the "Recent Events" for the RDS instance. This happens daily: April 21, 2026, 23:33 (UTC-05:00) Recovery of the DB instance is complete. April 21, 2026, 23:33 (UTC-05:00) DB instance restarted April 21, 2026, 23:29 (UTC-05:00) Recovery of the DB instance has started. Recovery time will vary with the amount of data to be recovered. April 21, 2026, 09:15 (UTC-05:00) DB Instance wtbeta contains MyISAM tables that have not been migrated to InnoDB. These tables can impact your ability to perform point-in-time restores. Consider converting these tables to InnoDB. Please refer to http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.BackingUpAndRestoringAmazonRDSInstances.html#Overview.BackupDeviceRestrictions April 21, 2026, 07:04 (UTC-05:00) Recovery of the DB instance is complete. April 21, 2026, 07:03 (UTC-05:00) DB instance restarted April 21, 2026, 06:59 (UTC-05:00) Recovery of the DB instance has started. Recovery time will vary with the amount of data to be recovered. April 21, 2026, 03:26 (UTC-05:00) Recovery of the DB instance is complete. April 21, 2026, 03:25 (UTC-05:00) DB instance restarted April 21, 2026, 03:21 (UTC-05:00) Recovery of the DB instance has started. Recovery time will vary with the amount of data to be recovered.

I checked the error log which is streamed to CloudWatch, and I don't see any errors related to a crash or recovery.

The instance maintains about the same amount of freeable memory all week, has sufficient storage space, and no CPU spikes about 65% either.

Please help!

  • If my answer was helpful, I would appreciate it if you could mark it as the accepted answer.

2 Answers
2

As far as I know, the pattern of these restarts (approx. 4-5 minutes from "started" to "complete") is typical for an RDS instance recovering from a process crash. While the general advice of the re:Post Agent (the other answer here in this thread) is a good starting point, here are the most likely technical root causes based on the behavior you described:

In short: Enable Enhanced Monitoring and the Slow Query Log. If you see a memory drop right before the crash, consider upgrading the instance class or optimizing the memory-intensive queries identified in the Slow Query Log.

1. Out of Memory (OOM) Killer

The most common reason for "random" crashes without entries in the DB error log is the OS-level OOM Killer. If the database process requests more memory than available, the underlying Linux kernel terminates the process instantly.

  • Why you don’t see it in logs: The process is killed before it can write a "shutdown" or "error" log entry.
  • Action: Check your Enhanced Monitoring (not just standard CloudWatch). Set the granularity to 1 or 5 seconds. Look for a sharp drop in FreeableMemory and a spike in SwapUsage exactly at 03:21, 06:59, and 23:29.

2. The MyISAM Impact

As indicated in your event log, MyISAM tables are not crash-safe.

  • The Issue: While MyISAM might not cause the crash, it significantly complicates the recovery. Every time the instance restarts, RDS must perform a CHECK TABLE on MyISAM files. This explains why your recovery takes several minutes.
  • Action: Convert these tables to InnoDB immediately. This will enable ACID compliance and much faster crash recovery.

3. Hidden Application Schedules

Since the restarts happen at specific times but more than once a day, they are likely tied to application-side cron jobs, heavy reporting tasks, or ETL processes.

  • Cross-reference your application logs or task schedulers with these timestamps. Look for any process that might be opening a large number of connections or executing memory-intensive queries (e.g., large JOINs or GROUP BY without indexes).

4. Check for "Burst Balance"

If you are using GP2/GP3 storage and your workload is hitting the IOPS limit, the instance can become unresponsive, triggering an automated health-check restart by AWS.

  • Check the BurstBalance (for GP2) or DiskQueueDepth metrics in CloudWatch.
EXPERT
answered a month ago
0

Based on the pattern you're describing with regular restarts and recovery cycles occurring three times daily, here are several areas to investigate:

Check for Automated Maintenance or Backups: The timing of these restarts (around 3:21 AM, 6:59 AM, and 11:29 PM UTC-5) suggests they might be scheduled events. Verify your maintenance window settings and automated backup schedule, as these can cause brief unavailability.

Investigate Database Engine Logs: While you've checked the error log in CloudWatch, you should also examine other available logs depending on your database engine. For example, if this is an Oracle instance, you can use the RDS diagnostic procedures to list incidents and problems through the Automatic Diagnostic Repository. Look for any incidents that might correlate with the restart times.

Review Database-Specific Issues: The event mentioning MyISAM tables is significant. MyISAM tables don't support crash recovery as well as InnoDB tables do. If your database is experiencing any issues, MyISAM tables could be contributing to instability. Consider converting these tables to InnoDB as recommended in the event message.

Check for:

  • Any custom scripts or cron jobs that might be triggering restarts
  • Application-level connection issues that could cause the database to become unresponsive
  • Parameter group settings that might have thresholds causing automatic restarts
  • Enhanced Monitoring metrics for deeper insights into OS-level metrics
  • Any recent changes to your RDS configuration or application code

Contact AWS Support: Given the regular pattern and lack of obvious errors, this warrants opening a support case with AWS. They can review internal logs and metrics that aren't visible to you and help identify the root cause of these restarts.
Sources
Diagnosing problems with RDS for Oracle DB instances - Amazon Relational Database Service

answered a month ago
EXPERT
reviewed a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.