Skip to content

How do I troubleshoot transaction ID wraparound in Aurora PostgreSQL-Compatible?

7 minute read
0

I want to troubleshoot transaction ID wraparound issues in my Amazon Aurora PostgreSQL-Compatible Edition database (DB) cluster.

Resolution

Monitor metrics and CloudWatch alarms to prevent issues

To monitor the oldest unvacuumed transaction age, check the MaximumUsedTransactionIDs metric in Amazon CloudWatch. To prevent transaction ID wraparound issues, create a CloudWatch alarm. For more information, see Implement an early warning system for transaction ID wraparound in Amazon Relational Database Service (Amazon RDS) for PostgreSQL.

Run diagnostic queries

Use psql or pgAdmin to connect to the PostgreSQL DB instance.

To check the database age, run the following command:

SELECT datname, age(datfrozenxid)
FROM pg_database
ORDER BY 2 DESC
LIMIT 20;

Note: If the output shows that the rdsadmin or template0 database has the oldest transaction ID age, then contact AWS Support.

To connect to the database with the highest age to identify potential issues with tables, run the following command:

SELECT
c.relnamespace::regnamespace as schema_name,
c.oid::regclass as table_name,
greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age,
pg_size_pretty(pg_table_size(c.oid)) as table_size,
t.relkind, t.relpersistence
FROM pg_class c
LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
WHERE c.relkind IN ('r', 'm')
ORDER BY 3 DESC
LIMIT 20;

To review the autovacuum history of tables with high age values, run the following command:

SELECT
  relname,
  n_live_tup, n_tup_upd, n_tup_del, n_dead_tup,
  last_vacuum, last_autovacuum,
  last_analyze, last_autoanalyze
FROM pg_stat_user_tables
WHERE relname='table_name';

Note: Replace table_name with your table name.

To monitor active vacuum operations, run the following command:

SELECT datname, usename, pid, state, wait_event,
current_timestamp - xact_start AS xact_runtime, query
FROM pg_stat_activity
WHERE upper(query) LIKE '%VACUUM%'
AND pid <> pg_backend_pid()
ORDER BY xact_start;

Resolve autovacuum process blockers

If you use one of the following Aurora PostgreSQL-Compatible versions, then see Identify and resolve aggressive vacuum blockers in Aurora PostgreSQL:

  • 13.19 or later
  • 14.16 or later
  • 15.11 or later
  • 16.7 or later
  • 17.2 or later

Resolve long-running transactions

Long-running transactions or in-transition sessions that are idle can block autovacuum. 

You see the following warning message in your PostgreSQL error logs:

"WARNING: oldest xmin is far in the past HINT: Close open transactions soon to avoid wraparound problems."

To identify transactions that are blocking autovacuum, run the following command:

SELECT pid,
    age(backend_xid),
    substr(query,1,20),
    extract(epoch from now()) - extract(epoch from xact_start) as duration,
    state
FROM pg_stat_activity
WHERE xact_start is not null
AND age(backend_xid) is not null
ORDER BY 2 DESC
LIMIT 10;

To resolve this issue, you can either cancel active queries or terminate connections.

To cancel active queries, run the following command:

SELECT pg_cancel_backend(pid);

Note: Replace pid with the process ID from the query results.

To terminate connections, run the following command:

SELECT pg_terminate_backend(pid);

Note: Replace pid with the process ID from the query results.

To make sure that long-running transactions don't block autovacuum again, set the statement_timeout, idle_in_transaction_session_timeout, and log_min_duration_statement parameters.

Resolve inactive logical replication slots

To identify inactive slots, run the following command:

SELECT slot_name, slot_type, database, xmin, catalog_xmin,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(),restart_lsn)) AS replicationSlotLag,
    active
FROM pg_replication_slots
ORDER BY age(xmin), age(catalog_xmin) DESC;

Warning: Before you drop a replication slot, confirm that it has no ongoing replication, is inactive, and is in an unrecoverable state. If you drop a slot that's still replicating, active, and recoverable, then you might disrupt replication or cause data loss.

To drop unused slots, run the following command:

SELECT pg_drop_replication_slot('slot_name');

Note: Replace slot_name with your replication slot name.

Resolve reader instance issues

To identify readers in an Aurora cluster that might be preventing hot standby feedback and blocking autovacuum, run the following query:

select server_id, feedback_epoch, feedback_xmin
from aurora_replica_status()
WHERE (feedback_xmin IS NOT NULL AND feedback_xmin > 0)

Run the following query for readers that are preventing hot standby feedback in Global Database clusters:

select server_id,aws_region,feedback_epoch,feedback_xmin
from aurora_global_db_instance_status ()
WHERE (feedback_xmin IS NOT NULL AND feedback_xmin > 0);

Run the following query in the reader instances with the lowest combined value of feedback_epoch and feedback_xmin:

SELECT pid,
    age(backend_xid),
    substr(query,1,20),
    extract(epoch from now()) - extract(epoch from xact_start) as duration,
    state
FROM pg_stat_activity WHERE backend_xmin::text::bigint =feedback_xmin

Note: Replace feedback_xmin with the feedback_xmin value from the previous query results.

If you can safely do so, then use pg_terminate_backend(pid) to terminate the long-running transactions.

Resolve uncommitted prepared transactions

To identify prepared transactions, run the following command:

SELECT database, gid, prepared, owner, database, transaction::text::bigint as xid,
now() - prepared AS time_since_prepared
FROM pg_prepared_xacts
ORDER BY age(transaction) DESC;

To resolve prepared transactions, you can either commit the prepared transaction or roll it back.

To commit the prepared transaction, run the following command:

COMMIT PREPARED 'gid'

Note: Replace gid with the Global Transaction ID from the query results.

Warning: Before you roll back a prepared transaction, make sure that the transaction isn't part of a distributed transaction sequence.

To roll back the prepared transaction, run the following command:

ROLLBACK PREPARED 'gid'

Note: Replace gid with the Global Transaction ID from the query results.

Resolve orphaned temporary tables

When the PostgreSQL autovacuum process encounters orphaned temporary tables, it logs the following event:

LOG: autovacuum: found orphan temp table "%s"."%s" in database "%s"

To identify orphaned temporary tables, run the following command:

SELECT
  oid,
  relname,
  relnamespace::regnamespace,
  age(relfrozenxid)
FROM pg_class
WHERE relpersistence = 't'
ORDER BY age(relfrozenxid) DESC;

To drop the temporary table, run the following command:

DROP TABLE temporary_table

Note: Replace temporary_table with your temporary table name.

Resolve vacuum performance issues

When autovacuum is performing slowly, freeze operations might lag.

To resolve this issue, first increase the autovacuum_max_workers parameter for more concurrent vacuum operations. Then, use the following formula to equally increase the autovacuum_vacuum_cost_limit parameter:

Individual worker's cost_limit = autovacuum_vacuum_cost_limit / autovacuum_max_workers

If vacuum performance remains insufficient after you increase the parameters, then upgrade your Amazon RDS instance to a class that has more vCPUs. You can modify the autovacuum_vacuum_cost_delay parameter at the table level to speed up autovacuum operations.

To modify the autovacuum_vacuum_cost_delay parameter at the table level, run the following command:

ALTER TABLE mytable SET ( autovacuum_vacuum_cost_delay = value, autovacuum_vacuum_cost_limit = value );

Note: Replace mytable with your table name and value with your desired parameter values.

Large index vacuum performance

When you're vacuuming large indexes, memory constraints can force multiple passes over the same index and significantly increase the total VACUUM duration.

If you're experiencing slow VACUUM operations on large indexes in PostgreSQL 16 or earlier, then check for the following potential causes:

  • The index size exceeds 1 GB.
  • Multiple VACUUM passes are occurring. To view the number of passes, query index_vacuum_count from the pg_stat_progress_vacuum view.
  • You incorrectly configured maintenance_work_mem or autovacuum_work_mem.

To resolve this issue, take the following actions:

  • Drop unused or duplicate indexes.
  • Increase maintenance_work_mem or autovacuum_work_mem. For the new memory setting to take effect, terminate the processes that are currently running autovacuum
  • Manually perform a VACUUM FREEZE operation, and increase maintenance_work_mem within your session.

If you use PostgreSQL version 12 or later, then run the following command to review potential manual vacuums with index exclusions:

VACUUM (INDEX_CLEANUP FALSE, VERBOSE TRUE) table_name;

Note: Replace table_name with your table name.

Logical inconsistency

If a logically inconsistent index blocks the autovacuum process, then you might receive one of the following error messages:

  • ERROR: right sibling's left-link doesn't match:block 5 links to 10 instead of expected 2 in index ix_name
  • ERROR: failed to re-find parent key in index "XXXXXXXXXX" for deletion target page XXX CONTEXT: while vacuuming index index_name of relation schema.table

To resolve this issue, rebuild the index. Or for PostgreSQL versions 12 and later, you can skip the index cleanup.

To rebuild the index, run the following command:

REINDEX INDEX ix_name CONCURRENTLY;

Note: Replace ix_name with your index name.

-or-

To skip the index cleanup for PostgreSQL 12 and later, run the following command:

VACUUM FREEZE INDEX_CLEANUP FALSE table_name;

Note: Replace table_name with your table name.

Related information

Preventing transaction ID wraparound failures on the PostgreSQL website

Working with PostgreSQL autovacuum on Amazon Aurora PostgreSQL

Understanding autovacuum in Amazon RDS for PostgreSQL environments

How do I end long-running queries in my Amazon RDS for PostgreSQL or Aurora PostgreSQL-Compatible DB instance?

AWS OFFICIALUpdated a month ago