RDS Postgres - Rows Reverting or Resetting Periodically after upgrade

0

Issue Summary

We are experiencing an issue with our AWS RDS PostgreSQL database (Postgres version 16 upgraded to 17 a while ago - likely when the problem started) where rows in a specific table periodically revert or reset their state. Here's a detailed breakdown of the situation and what we've investigated so far:


Observations

  1. Row Updates:
    • Rows in the table (chapter_metadata) have an FK column (chapter_id) where values (UUIDs) are set correctly after an INSERT or UPDATE. Trying to use a non-existing FK raises an error as expected.
    • After a few minutes, these values are progressively set to NULL. This happens incrementally and not all at once.
  2. pg_stat_activity Logs:
    • We observed a ROLLBACK query initiated by the rdsadmin user;
    • The rdsadmin activity raises questions about internal RDS behavior;
  3. Connection Details:
    • The connection is confirmed to point to the primary (read/write) database and not a replica (this is the main DB of a replica setup);
    • SSL connections were also tested, with the same results, but it important to mention that during the upgrade the param rds.force_ssl was evaluated as true, even if it was set to false (0) on the parameter group. A reboot solved rds.force_ssl issue when it occurred;
    • No triggers or cascading foreign key actions are active on the affected column anymore (FK was dropped as part of troubleshooting);

Troubleshooting Steps Taken

  1. Snapshot/Backup Events:
    • No recent snapshot or backup-related events occurred near the times of these resets, except a daily automatic backup recorded hours before;
    • Tried rebooting the DB more than once;
  2. Replication Health:
    • The database has one read replica, and pg_stat_replication shows normal streaming with negligible lag. We also tried removing the read replica completely, and it did not solve the issue;
  3. Active Transactions Monitoring:
    • Using pg_stat_activity, only the following relevant activities were observed:
      • One user query from appadmin (my user);
      • One START_REPLICATION process from rdsrepladmin;
      • Occasionally, the rdsadmin user executing a ROLLBACK.
  4. Database Parameters:
    • max_prepared_transactions = 0 (default), but we tried increasing to 100 and the error still occurred;
    • idle_in_transaction_session_timeout = 0;
    • autovacuum is ON with default thresholds (autovacuum_analyze_threshold = 50 and autovacuum_vacuum_threshold = 50);
    • default_transaction_isolation = read committed;
    • rds.force_ssl = 0 (correctly set after upgrade, although we also tried connecting using a SSL connection and the issue persisted);
  5. Triggers and FK Constraints:
    • No triggers exist on the affected table (chapter_metadata);
    • Dropped the FK constraint on the affected column (chapter_id), but the problem persists;
  6. Isolation from External Tools:
    • No tools like AWS DMS, Database Migration Service, or other automation are in use;
    • No Lambda functions, EventBridge rules, or Step Functions target this database;
    • Verified that no jobs (e.g., from pg_cron) are running;
    • The error takes place despite the connection tool (tested with DBeaver and DataGrip);

Additional Findings

  1. Query Logging:
    • Enabling full query logging (log_statement = all) showed no unexpected queries from known users;
    • The ROLLBACK from rdsadmin is still unexplained;
  2. Isolation Testing:
    • The DB is within an internal network (without public access) and the pg_stat_activity query result showed no suspicious query running;

Request for Assistance

  1. What could be causing the rdsadmin ROLLBACK activity? Is this expected behavior in certain scenarios?
  2. Are there any known RDS internal processes that might revert table data under specific conditions?
  3. Could the issue be related to the recent Postgres version upgrade (16 to 17)?
  4. Is there a way to get detailed logs for rdsadmin activity to trace what it is rolling back and why?

I’ve already tried the above debugging steps, but I’m unable to determine the root cause. If anyone has encountered similar issues or has insights into RDS-specific behavior that could explain this, your guidance would be greatly appreciated.

2 Answers
0

Hi,

What do you have as an FK: "ON DELETE SET NULL" ?

If yes, your incremental set to null can be explained by delete activity on the primary table.

Then, check the delete activity on the table with PK to see where it comes from to stop it. Look at the WAL to see if such activity exists. For example, do you have some automatically triggered snapshot restore activity ?

Best,

Didier

profile pictureAWS
EXPERT
answered 2 months ago
  • Thanks for the answer Didier!

    So we had an ON DELETE CASCADE on this FK (as the chapter_metadata was a localization table for the chapter itself, so whenever a chapter was deleted, the metadata would also be gone). We tried also removing the FK constraint (leaving the field with a plain UUID only), but the issue was still there. No chapters are touched; the FKs/PKs are still valid, but the table keeps reverting row-by-row after a while. We also checked the Events tab on the AWS RDS console page, and we only have the snapshot-creating record triggered once every day as expected, but no restore call.

    Thanks, Fernando

0

Based on the information provided, this is indeed an unusual and complex issue. Let's address your questions and provide some insights:

  1. The rdsadmin ROLLBACK activity is not typical behavior and is not expected under normal circumstances. This could indicate an internal RDS process that's encountering an issue and rolling back changes.

  2. There are no known RDS internal processes that should revert table data under normal conditions. However, the behavior you're observing suggests that something is interfering with the data integrity.

  3. The issue could potentially be related to the recent PostgreSQL upgrade from version 16 to 17. Major version upgrades can sometimes introduce unexpected behaviors, especially if there are any incompatibilities with existing configurations or extensions.

  4. Unfortunately, there isn't a direct way for users to access detailed logs for rdsadmin activity. However, you can enable enhanced monitoring and performance insights to get more detailed information about database activity.

Given the complexity of your situation, here are some additional steps you could take:

  1. Check for any PostgreSQL extensions that might be causing conflicts after the upgrade. Ensure all extensions are compatible with PostgreSQL 17.

  2. Review your RDS parameter group settings thoroughly. Some parameters might need adjustment after a major version upgrade.

  3. Monitor the pg_stat_activity view more closely, particularly around the times when data reverts. Look for any patterns in timing or concurrent activities.

  4. Consider creating a test environment with a snapshot of your database to safely experiment with different configurations and isolate the issue.

  5. Check if there are any scheduled maintenance activities or automated processes running on your RDS instance that could be interfering with data integrity.

  6. Review your application code for any potential issues with transaction management or connection pooling that could be causing unexpected rollbacks.

  7. If the issue persists, it may be worth engaging AWS Support directly. They have access to more detailed logs and can investigate RDS-specific behaviors that might not be visible to users.

Remember, this is an unusual situation, and the root cause might be a combination of factors. Systematic investigation and potentially AWS Support involvement may be necessary to fully resolve the issue.
Sources
Amazon RDS for PostgreSQL now supports major version 17 - AWS
Upgrades of the RDS for PostgreSQL DB engine - Amazon Relational Database Service
Amazon RDS for PostgreSQL supports minor versions 15.5, 14.10, 13.13, 12.17, and 11.22

profile picture
answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions