Skip to content

RDS PostgreSQL stuck in "Starting" state for hours after minor version upgrade (15.12 to 15.17)

0

Hi everyone,

I am seeking guidance on an issue we encountered during testing a database upgrade. We recently received a notification from AWS that our RDS PostgreSQL minor version is being deprecated, so we are preparing to upgrade our instances from version 15.12 to 15.17.

Before touching production, we tested the process on our staging instance. The upgrade appeared successful, but the instance is now completely unresponsive.

TIMELINE:

  • May 07, 23:32 (UTC+06:00): Upgraded the staging instance from PostgreSQL 15.12 to 15.17.
  • Post-Upgrade: According to the RDS Events, the upgrade, post-upgrade tasks, and automated backup all completed successfully.
  • Instance Stopped: I stopped the staging instance after the upgrade to avoid unnecessary billing costs overnight.
  • May 08, 11:00 (UTC+06:00): I attempted to start the staging instance again for testing purposes.

It has now been stuck in the "Starting" state for over 4-5 hours.

OBSERVED BEHAVIOR:

  • The instance remains indefinitely in the "Starting" state.
  • Reboot, Start, and Modify actions are disabled.
  • The console shows the instance as "not in a modifiable state."
  • PostgreSQL logs are not being generated or visible.
  • We cannot increase storage or perform any modifications.

Since we are on the Basic Support plan, we cannot open a technical support ticket to have AWS engineers look at the underlying infrastructure.

Has anyone experienced this specific behavior after a minor version upgrade and subsequent stop/start?

Specifically, we are trying to figure out:

  1. Is this likely a silent internal RDS recovery process, or did the instance launch onto degraded underlying hardware?
  2. Is there any way to force a state change or access deeper logs when the console is locked out?
  3. Given this happened on staging, are there any specific precautions we should take before attempting this on our production instance to prevent the same lockup?

Any insights or troubleshooting tips would be greatly appreciated. Thank you!

asked a month ago68 views

2 Answers
2

Additional Troubleshooting Steps:

  • Check CloudWatch Metrics: Even in "Starting" state, check Disk Read/Write Ops and EBS Byte Balance. If you see activity, the DB is likely performing a recovery or block hydration—do not interrupt it. If activity is zero, the instance is likely stuck on degraded hardware.
  • The "Post-Upgrade Stop" Trap: Stopping an instance immediately after a minor upgrade often interrupts critical internal processes (like system catalog updates or WAL-logging). This likely triggered a lengthy recovery upon restart.
  • Actionable Alternative: Since you are on Basic Support, don't wait indefinitely. Use Point-in-Time Recovery (PITR) to restore a new instance from the state just before the upgrade. This is often faster than waiting for a "stuck" instance to resolve.
  • CLI Trigger: Try modifying a non-critical parameter (like a Security Group) via the AWS CLI. Sometimes this forces the RDS control plane to refresh the instance state when the Console UI is unresponsive.
EXPERT

answered a month ago

EXPERT

reviewed a month ago

0

I understand this is a concerning situation. While I don't have specific information about instances being stuck in a "Starting" state after a minor version upgrade, I can provide some general guidance based on RDS PostgreSQL upgrade best practices.

Regarding your specific questions:

  1. Silent recovery vs. hardware issues: Without access to the underlying infrastructure logs, it's difficult to determine the exact cause. However, extended "Starting" states can sometimes indicate the database is performing recovery operations, especially after a stop/start cycle following an upgrade.

  2. Forcing state changes: Unfortunately, when an RDS instance is in a non-modifiable state, your options through the console are limited. The instance needs to complete whatever internal process it's executing before it becomes responsive again.

  3. Precautions for production: For your production upgrade, I strongly recommend:

  • Schedule the upgrade during a maintenance window with low traffic
  • Avoid stopping the instance immediately after the upgrade completes - allow it to stabilize and verify full functionality first
  • Consider using RDS Extended Support if you need more time to plan and test upgrades, as this prevents automatic upgrades until after the standard support end date
  • Take a manual snapshot before beginning the upgrade process
  • Test database connectivity and application functionality thoroughly on staging before proceeding to production

Important considerations:

When performing minor version upgrades, it's recommended to schedule them during periods of low traffic and ensure your applications use AWS SDKs with exponential backoff and jitter to handle any temporary connectivity issues during the upgrade process.

Given that you're on Basic Support and cannot access technical support, you may need to wait for the instance to complete its internal processes. If the instance remains stuck beyond a reasonable timeframe (typically several hours), you may need to consider restoring from your pre-upgrade snapshot.

For future upgrades, avoiding the stop/start cycle immediately after an upgrade and allowing the instance to fully stabilize first may help prevent this issue.
Sources
Performing a minor version upgrade - Amazon Aurora

answered a month ago

EXPERT

reviewed a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.