Skip to content

Why do large objects lead to slowness or failure of major version upgrades in RDS/Aurora PostgreSQL?

9 minute read
Content level: Expert
3

The impact of large objects on major version upgrades in PostgreSQL is discussed in this article.

PostgreSQL, being an open-source project, regularly releases minor and major updates to address common bugs, enhance security, and resolve data corruption issues. Amazon RDS typically strives to incorporate these new engine versions within a five-month window after their release. It's important to note that when a specific PostgreSQL version reaches end-of-support, you'll need to upgrade your RDS PostgreSQL instances. In such cases, RDS proactively sends email notifications recommending database instance upgrades. You have several options for performing these upgrades: using the RDS console, executing the AWS CLI command 'modify-db-instance', or enabling the Auto Minor Version Upgrades feature for automatic updates to compatible minor versions. This last option ensures your instances stay current with the latest minor releases without manual intervention.

This post explores the impact of large objects in databases on the performance and success of major PostgreSQL version upgrades. We'll examine methods to assess the quantity and overall size of large objects within a database. Additionally, we'll delve into the concept of "orphaned large objects," explaining how they come into existence and their implications. Our discussion will focus on techniques to identify orphaned large objects in your database and strategies for their removal. By addressing these issues, we aim to help you reduce database upgrade times and prevent potential upgrade failures caused by these objects. Understanding and managing large objects effectively can significantly improve the upgrade process for your PostgreSQL database. These may also help to bring down storage utilization of the database as well.

Impact of large objects on major version upgrades.

The PostgreSQL major version upgrade process uses the pg_upgrade utility and it involves two key phases:

• Dumping the schema using pg_dump

• Restoring it to the new cluster using pg_restore.

During the database upgrade process, pg_dump and pg_restore operations consume substantial memory when handling a large volume of database objects. This high memory usage can lead to two potential outcomes: a significantly extended upgrade duration or an unexpected termination of the process. The instance may experience an Out-of-Memory (OOM) condition during the upgrade, causing an abrupt failure without any error messages in the upgrade logs.

Check size of large objects within your database

The catalog pg_largeobject holds the data making up “large objects”. A large object is identified by an OID assigned when it is created. Each large object is broken into segments or “pages” small enough to be conveniently stored as rows in pg_largeobject. The catalog pg_largeobject_metadata holds metadata associated with large objects. The actual large object data is stored in pg_largeobject. You can query table “pg_largeobject_metadata” and “pg_largeobject” to get details about large objects.


SELECT COUNT(*) as pg_largeobject_metadata_count FROM pg_largeobject_metadata;

SELECT pg_size_pretty(pg_total_relation_size('pg_largeobject')); `

What are orphaned large objects

One of the most common reason for the upgrade failure when you have a huge number of large objects in the database is “orphaned large objects”.

In PostgreSQL, “orphaned large objects” refer to large objects (LO) entries in the pg_largeobject system table that are no longer referenced by any table in the database. These large objects persist in the database but serve no useful purpose, consuming storage unnecessarily.

How Large Objects Become Orphaned:

  1. Deleting rows without deleting Large Objects: A large object has a lifecycle of Create, Put, Unlink. Applications use the lo_creat() function to create a new, empty large object and generate an OID for it. Applications put data in to the large object using lo_put() or lo_from_bytea(), using the OID to identify which large object to populate. The OID is normally stored for later use (to retrieve the large object) in some table according to the application schema. When the data row that stores the OID is deleted, the lo_unlink() function must be called to delete the large object, otherwise the large object will be left as an orphan in the database with no pointer (OID) referring to it.
  2. Dropping tables that further refer Large Objects: If a table storing Large object OID is dropped, PostgreSQL doesn’t remove/drop the corresponding LOs.
  3. Updating Rows without cleaning up old LOs: If an update performed on the database is changing an LO reference, the old LOs might become orphaned unless explicitly deleted.

Clean Orphaned Large Objects:

There are two ways of cleaning large objects One way would be using vacuumlo utility. The second would be using “lo_manage” function to clean up and prevent orphaned large objects when the lo is updated or deleted.

Using vacuumlo

vacuumlo is a simple utility program that will remove any “orphaned” large objects from a PostgreSQL database. Using this we will first identify the orphaned large objects in the database and them how can we remove them.

Identifying orphaned large objects (--dry-run)

In PostgreSQL, orphaned large objects can be detected by analyzing the pg_largeobject system table. These orphans are characterized by object identifiers (loids) that lack corresponding references in user-defined tables, which normally store or link to large objects. This absence of proper linkage reveals that while the large object is present in the database system, it is not connected to any specific data in the user tables where it would typically be associated.

The vacuumlo utility identifies and removes orphaned large objects from databases. This utility has been available since PostgreSQL 9.1.24. If your database users routinely work with large objects, we recommend that you run vacuumlo occasionally to clean up orphaned large objects. You can use vacuumlo to assess whether your RDS/Aurora PostgreSQL DB cluster has orphaned large objects or not.

To do so, run vacuumlo with the -n option (no-op) to show what would be removed, as shown in the following:

$ vacuumlo -v -n -h mycluster.aaaaaa. aws-region.rds.amazonaws.com -p 5432 -U postgres postgres -W
Password: 
Connected to database "postgres"
Test run: no large objects will be removed!
Checking data in public.test_lo
Would remove 1 large objects from database "postgres". 

As the output shows, orphaned large objects are there for this particular database.

Removing Orphaned Lage Objects(removing -n from same command)


$ vacuumlo -v -h mycluster.aaaaaa. aws-region.rds.amazonaws.com -p 5432 -U postgres postgres -W 
Password: 
Connected to database "postgres"
Checking data in public.test_lo
Successfully removed 1 large objects from database "postgres".

How vacuumlo works

  1. vacuumlo starts by creating a temporary table containing all the Object IDs (OIDs) of the large objects in your database.
  2. vacuumlo then scans through every column in the database that uses the data types oid or lo. If vacuumlo finds a matching OID in these columns, it removes the OID from the temporary table. vacuumlo checks only columns specifically named oid or lo, not domains based on these types.
  3. The remaining entries in the temporary table represent orphaned LOs, which vacuumlo then safely removes.

You can potentially improve the performance of vacuumlo by increasing the batch size using the -l option. This allows vacuumlo to process more LOs at once.

Using lo_manage:

If you are using large objects in the database, you can use “lo_manage” function to clean up and prevent orphaned large objects when the lo is updated or deleted. This is a function that you can use in triggers on table columns that contain large object references. Whenever you delete or modify a value that references a large object, the trigger unlinks the object (lo_unlink) from its reference. Use the trigger on a column only if the column is the sole database reference to the large object. Using the lo_manage trigger function to delete objects.

You can use the lo_manage function in a trigger on a lo or other large object columns to clean up (and prevent orphaned objects) when the lo is updated or deleted.

To set up triggers on columns that reference large objects, you need to do one of the following:

• Create a BEFORE UPDATE OR DELETE trigger on each column to contain unique references to large objects, using the column name for the argument.

postgres=> CREATE TRIGGER t_raster BEFORE UPDATE OR DELETE ON images FOR EACH ROW EXECUTE FUNCTION lo_manage(raster);

• Apply a trigger only when the column is being updated.

postgres=> CREATE TRIGGER t_raster BEFORE UPDATE OF images FOR EACH ROW EXECUTE FUNCTION lo_manage(raster);

The lo_manage trigger function works only in the context of inserting or deleting column data, depending on how you define the trigger. It has no effect when you perform a DROP or TRUNCATE operation on a database. That means that you should delete object columns from any tables before dropping, to prevent creating orphaned objects.

For example, suppose that you want to drop the database containing the images table. You delete the column as follows.

postgres=> DELETE FROM images COLUMN raster; 

Assuming that the lo_manage function is defined on that column to handle deletes, you can now safely drop the table.

Key Considerations:

• There may be scenarios where large objects are very high in number in your database with no orphaned large objects and you may have to test upgrades scaling you instance class with high memory.

• We always recommend restoring a Snapshot and testing the upgrade.

• Autovacuum doesn’t handle orphaned large objects. So, vacuumlo is the only solution as there is currently no other method to remove orphaned LOs.

• When designing your database schema, avoid using large objects when possible and consider using alternative data types like bytea instead.

• Run vacuumlo regularly, at least weekly, to prevent issues with orphaned LOs. You can create custom scripts or cron jobs to avoid accumulation of the these large objects.

• vacuumlo can take even days to complete depending upon the number of orphaned large objects. vacuumlo will not block the read/write operations. Though resource utilization like CPU, memory and IOPS utilization can increase.

• Use a trigger with the lo_manage function on tables that store large objects to help prevent orphaned LOs from being created.

• Removing orphaned large objects can help to drastically reduce upgrade time.

AWS
SUPPORT ENGINEER
published a month ago369 views