Skip to content

Swap in AWS RDS PostgreSQL Database Instances

40 minute read
Content level: Expert
2

Swap activity on Amazon RDS for PostgreSQL is one of the most frequently misunderstood signals during memory-related investigations. Many users see SwapUsage > 0 in CloudWatch and assume something is wrong, when in most cases the swap is benign — cold, idle backend memory moved to disk that nobody is accessing. This article helps readers distinguish between harmless swap and genuine memory pressure by explaining how Linux swap works on RDS PostgreSQL instances, what the monitoring metrics actual

1. Introduction

This article provides a comprehensive guide to understanding, monitoring, and troubleshooting swap on Amazon RDS for PostgreSQL instances. It covers Linux swap fundamentals as they apply to managed database instances, PostgreSQL's memory allocation patterns, all relevant Enhanced Monitoring and Performance Insights counters, and practical troubleshooting workflows.

Scope: This article covers Amazon RDS for PostgreSQL ONLY. Amazon Aurora PostgreSQL has a fundamentally different compute/storage architecture (distributed storage layer, no local EBS for data) that changes the swap picture significantly.


2. Swap Fundamentals on RDS PostgreSQL Instances

What Is Swap at the Linux OS Level

Linux swap space is a disk-based extension of physical RAM. When the kernel's memory reclaim algorithm determines that physical RAM is under pressure, it moves less-recently-used memory pages from RAM to the swap device on disk. This frees physical RAM for active processes. Later, when a process accesses a page that was previously moved to swap, the kernel reads it back from disk into RAM before the process can continue.

Swap is implemented as either a dedicated swap partition or a swap file on a filesystem. On RDS PostgreSQL instances, swap is provisioned as a fixed-size swap area at the platform level — the size is determined by the instance class and is not customer-adjustable.

Linux reference: For the kernel's swap subsystem internals, see man 2 swapon, man 8 mkswap, and the kernel documentation on swap management.

How Much Swap Is Provisioned per Instance Class

The amount of swap space on an RDS PostgreSQL instance is not a fixed value — it varies by instance class and is provisioned at the infrastructure layer during instance creation. The swap size is not configurable by customers and is not exposed as a parameter group setting.

How swap size is determined:

The swap size is calculated based on the instance's total physical memory and the instance family/generation. The general relationship is:

  • Burstable instances (T-class): Swap size is typically 2× the instance memory (e.g., a db.t3.micro with 1 GB RAM gets ~2 GB swap; a db.t3.small with 2 GB RAM gets ~4 GB swap). This larger relative swap allocation compensates for the smaller RAM on burstable instances, providing more overflow buffer.
  • Standard/Memory-optimized instances (M-class, R-class): Swap size is typically a fixed fraction of instance memory, often in the range of 0.5–1× RAM for smaller sizes, tapering to a smaller fraction for very large instances. For example, a db.r5.large (16 GB RAM) may have ~2–4 GB swap, while a db.r5.8xlarge (256 GB RAM) may have ~4–8 GB swap.

How to check the actual swap size on your instance:

You can determine the exact swap size provisioned on any running RDS PostgreSQL instance using Enhanced Monitoring:

Enhanced Monitoring → swap.total (in kilobytes)

Or via Performance Insights:

os.swap.total (in kilobytes)

For example, if swap.total = 4194300, the instance has approximately 4 GB of swap space.

Key takeaway: You cannot change the swap size on an RDS PostgreSQL instance. If the provisioned swap is insufficient for your workload's memory overflow needs, the correct remediation is to resize to a larger instance class (which provides both more RAM and more swap) or to reduce the workload's memory footprint through parameter tuning.

Why RDS PostgreSQL Instances Have Swap Space

Swap exists as a safety net against the OOM (Out-Of-Memory) Killer. Without swap, any transient spike in memory demand that exceeds physical RAM would immediately trigger the OOM Killer, terminating processes to reclaim memory. On an RDS PostgreSQL instance, an OOM kill of the PostgreSQL postgres server (main) process causes a full instance restart — all connections dropped, all in-flight transactions lost, crash recovery on startup.

Swap provides an overflow buffer that absorbs short-lived memory pressure, giving the system time to stabilize without killing processes. The trade-off: swap is backed by EBS storage, which is orders of magnitude slower than RAM. Sustained reliance on swap introduces significant latency, but it prevents catastrophic process death during transient memory spikes.

Memory Control Groups (cgroups) on RDS PostgreSQL

RDS PostgreSQL instances use Linux cgroups (control groups) to isolate and limit memory usage between the database engine and platform processes. Understanding cgroups is important for diagnosing memory and swap issues because the cgroup configuration directly affects how the kernel reclaims memory and decides whether to swap.

What are cgroups?

Cgroups are a Linux kernel feature that groups processes into hierarchical units and applies resource limits (memory, CPU, I/O) to each group independently.

Linux reference: See man 7 cgroups for the general cgroups overview and the kernel documentation on memory cgroups (cgroup v1) for details.

On RDS PostgreSQL instances, the PostgreSQL engine processes ( postgres server (main) , backends, background workers) run inside a dedicated memory cgroup. This cgroup enforces a hard memory limit and is configured so that the kernel strongly prefers dropping page cache over swapping out PostgreSQL process memory. This means anonymous pages (backend memory, work_mem allocations) are only swapped when file-backed pages are nearly exhausted within the cgroup.

Why this matters for swap behavior:

The DB engine cgroup does not get 100% of the instance's physical RAM. A portion is reserved for platform processes, kernel structures, and page tables. The approximate relationship is:

DB engine memory limit ≈ total_instance_memory
                       - platform_and_kernel_reserve
                       - hugepage_reserved

DB engine memory+swap limit ≈ DB engine memory limit
                             + (total_swap × 0.9)

When the PostgreSQL engine's memory usage approaches its cgroup limit, the kernel reclaims memory within the cgroup — first dropping page cache, then (reluctantly, given the cgroup's preference for page cache eviction) swapping out anonymous pages. If both memory and swap limits are exhausted, the OOM killer is invoked within the cgroup.

How this surfaces in customer-visible monitoring:

You cannot inspect cgroup files directly on RDS instances, but the effects are visible through Enhanced Monitoring and Performance Insights:

Linux concept (under the hood)What you see in monitoring
RSS (Resident Set Size) — anonymous pages in RAMos.memory.db.residentSetSize in Performance Insights
Swap usage within the engine cgroupos.memory.db.swap in Performance Insights
Page faults (major = swap-in)swap.in rate in Enhanced Monitoring
Total memory pressureFreeableMemory in CloudWatch (system-wide, not cgroup-specific)

Linux reference: RSS, page cache, and swap are standard Linux memory concepts. See man 5 proc (search for VmRSS) and the kernel memory cgroup documentation for how these are tracked per cgroup.

Swap-In vs Swap-Out: Definitions and Performance Implications

Swap-Out occurs when the kernel writes memory pages from RAM to the swap device. It generates write I/O to EBS. Swap-out is the kernel's response to memory pressure — it is evicting pages that haven't been accessed recently to free RAM for active workloads.

Swap-In occurs when a process accesses a memory page that was previously swapped out. The kernel must read that page back from the swap device into RAM before the process can continue. Swap-in generates read I/O latency — the PostgreSQL backend or background worker is stalled waiting for a disk read.

Performance impact hierarchy:

PatternSeverityMeaning
Swap-out only, no swap-inBenignCold pages moved to swap, never accessed again
Low swap-in, stable swap usageLow concernOccasional access to cold pages
Sustained swap-inProblematicActive workload hitting swapped pages regularly
High swap-in + high swap-out (thrashing)CriticalFeedback loop — pages evicted and immediately needed again

Relationship Between Swap and the OOM Killer

The OOM Killer is the kernel's last resort when both physical RAM and swap space are exhausted. The progression:

Linux reference: See man 5 proc (search for oom_score and oom_score_adj) and the kernel documentation on OOM killer behavior for how the kernel selects which process to kill.

  1. Memory pressure increases → kernel shrinks page cache
  2. Page cache minimized → kernel starts swapping out anonymous pages
  3. Swap space fills up → kernel cannot satisfy allocation requests
  4. OOM Killer invoked → selects process with highest oom_score and kills it

On RDS PostgreSQL instances:

  • OOM kill of a backend process → that client connection is terminated, in-flight transaction aborted, other connections unaffected
  • OOM kill of the ** postgres server (main) process** → full instance restart, all connections dropped, crash recovery

Swap delays the OOM Killer but does not prevent it when the workload's total memory footprint exceeds RAM + swap.


3. RDS PostgreSQL Memory Architecture

Instance Memory Layout

An RDS PostgreSQL instance runs on a dedicated host with a fixed amount of physical RAM shared among several major consumers:

Major Consumers of physical shared RAM

shared_buffers with Huge Pages (Pinned) vs Without (Swappable)

shared_buffers is the single largest memory allocation on any RDS PostgreSQL instance. It defaults to approximately 25% of instance RAM (formula: {DBInstanceClassMemory/32768}).

With huge pages enabled (default on most RDS PostgreSQL instance classes):

Linux reference: See man 5 proc (search for Hugepagesize) and the kernel documentation on HugeTLB pages for how huge pages are allocated and why they are pinned (non-swappable, non-reclaimable).

  • The shared_buffers segment is backed by Linux huge pages (2 MB pages on x86_64)
  • Huge pages are pinned in physical RAM by the kernel and are NEVER swapped
  • The kernel cannot reclaim huge-page-backed memory under any circumstances
  • This removes the largest memory consumer from the kernel's swap candidate pool entirely
  • Page table overhead is reduced by ~512x (one PTE per 2 MB instead of per 4 KB)

Without huge pages (smaller instance classes or fallback):

  • shared_buffers is backed by regular 4 KB pages
  • These pages ARE swappable — the kernel can and will evict them under memory pressure
  • Swapping shared buffers is catastrophic: every buffer pool access that hits a swapped page stalls on EBS I/O
  • Page table overhead is significant: a 4 GB shared_buffers segment requires ~8 MB of page table entries per process that maps it

How to verify huge pages status:

-- Check the huge_pages parameter setting
SHOW huge_pages;  -- 'on', 'try', or 'off'

Per-Backend Memory

Each PostgreSQL backend (one per client connection) allocates private memory:

ComponentIdleActive (complex query)Swappable?
Process image + stack~5 MB~5 MBYes
Catalog cache~1–5 MB~5–50 MBYes
Plan cache~1–5 MB~5–20 MBYes
work_mem (per plan node)04 MB – 1 GB+ per nodeYes
temp_buffers0Up to temp_buffers settingYes
Extension memory contextsVariesVariesYes

Critical: work_mem is allocated per plan node per query. A single query with 4 hash join nodes at work_mem = 256 MB consumes up to 1 GB. Multiply by concurrent backends and the aggregate can easily exceed available RAM.

Double Buffering with EBS

RDS PostgreSQL exhibits "double buffering" because data flows through both shared_buffers and the OS page cache:

  1. PostgreSQL reads a page not in shared_buffers → issues read() syscall
  2. Kernel checks page cache → if miss, reads from EBS into page cache
  3. Kernel copies data from page cache into the shared_buffers segment
  4. Same data now exists in both shared_buffers AND page cache

This means the effective "cache" is larger than shared_buffers alone — the page cache provides a second tier. But it also means memory is used less efficiently than it could be. The page cache portion is reclaimable (the kernel drops it under pressure), while the shared_buffers portion (with huge pages) is pinned.

For writes, dirty pages flow: shared_buffers → page cache (via write()) → EBS (via kernel writeback). The page cache acts as a write-back buffer between PostgreSQL and EBS.


4. When Pages Are Moved Out to Swap (Swap-Out Use Case)

This section explains the kernel's page reclaim algorithm and provides a concrete scenario of swap-out on an RDS PostgreSQL instance.

What Triggers the Kernel to Swap Out Pages

The kernel's memory reclaim is triggered when free memory drops below the low watermark (vm.min_free_kbytes derived). At this point, the kswapd daemon wakes up and begins scanning memory zones to reclaim pages. If kswapd cannot reclaim fast enough, direct reclaim occurs synchronously in the context of the allocating process (which stalls that process).

The reclaim algorithm must choose which pages to evict. It has two pools to draw from:

  1. File-backed pages (page cache): Pages that are backed by a file on disk. Clean file-backed pages can simply be dropped (they can be re-read from EBS). Dirty file-backed pages must be flushed to EBS first, then dropped.

  2. Anonymous pages: Pages that have no file backing — process heap, stack, mmap(MAP_ANONYMOUS|MAP_PRIVATE) allocations. These pages can only be reclaimed by writing them to the swap device. This is swap-out.

Linux reference: The distinction between file-backed and anonymous pages is fundamental to the kernel's page reclaim. See the kernel documentation on page reclaim concepts and man 2 mmap for the MAP_ANONYMOUS flag. The active/inactive LRU lists described below are documented in the kernel source at mm/vmscan.c.

The Kernel's Page Reclaim Algorithm

The kernel maintains two LRU (Least Recently Used) lists per memory zone:

  • Active list: Pages that have been accessed recently (considered "hot")
  • Inactive list: Pages that haven't been accessed recently (candidates for eviction)

Pages are promoted from inactive → active when accessed. Pages are demoted from active → inactive when they age out (haven't been accessed for a while). Eviction happens from the tail of the inactive list — the coldest pages are evicted first.

Both anonymous and file-backed pages have their own active/inactive LRU lists. The kernel's reclaim algorithm decides how aggressively to scan each list based on a tunable called swappiness (see kernel documentation):

scan_ratio = anon_pages_scanned / file_pages_scanned

At high swappiness (e.g., 60, the Linux default):
  The kernel scans anonymous pages at roughly 60% the rate of file-backed pages.
  It prefers to drop file-backed pages but WILL swap out anonymous pages too.

At low swappiness (e.g., 0):
  The kernel almost exclusively drops file-backed pages.
  Anonymous pages are only swapped when file-backed pages are nearly exhausted.

On RDS PostgreSQL, the DB engine cgroup is configured with a low swappiness, meaning the kernel strongly prefers to drop page cache rather than swap out PostgreSQL process memory. However, during global reclaim (system-wide memory pressure), even processes in a low-swappiness cgroup can be swapped out.

Which Pages Get Swapped Out First

The kernel evicts pages from the tail of the inactive anonymous LRU list — these are the anonymous pages that have gone the longest without being accessed. On an RDS PostgreSQL instance, typical swap-out candidates are:

CandidateWhy it's cold
Idle backend memoryBackend connected but not executing queries for hours
Cold catalog cachesCached metadata for tables/schemas not recently queried
Unused extension memory contextsExtensions that allocated memory at load time but aren't actively used
Old autovacuum worker memoryWorker finished but process hasn't been recycled
Logical replication decode buffersSlot exists but replication is paused or slow

Concrete Scenario: Burst of Concurrent Hash Joins

Consider an RDS PostgreSQL db.r5.2xlarge instance (64 GB RAM):

  • shared_buffers = 16 GB (huge pages, pinned)
  • OS + platform processes + kernel = ~4 GB
  • Page cache = ~20 GB (caching EBS data)
  • Available for backends = ~24 GB (page cache is reclaimable)
  • work_mem = 512 MB
  • max_connections = 200, typically 30 active

Normal state: 30 active backends × ~50 MB average working memory = ~1.5 GB. Plenty of headroom.

Burst event: A reporting workload kicks off. 60 backends simultaneously execute complex analytical queries with hash joins:

  1. Each backend allocates work_mem for hash join nodes: 60 × 512 MB = 30 GB demand
  2. Total memory demand now exceeds physical RAM
  3. Kernel's kswapd wakes up, begins reclaiming:
    • First: drops clean page cache pages (EBS data that can be re-read)
    • Page cache shrinks from 20 GB → 2 GB (kernel keeps a minimum)
    • Still not enough — 30 GB demand + 16 GB pinned shared_buffers + 4 GB OS > 64 GB
  4. Kernel begins scanning the anonymous inactive LRU list
  5. Finds cold pages: idle backends that haven't executed queries in hours, old catalog cache entries, unused extension memory
  6. Swaps out those cold anonymous pages to make room for the active hash join allocations

Result: swap.out counter increases in Enhanced Monitoring. The swapped pages belong to idle backends and cold memory regions.

The Impact: Swap-Out Is Only a Problem When Followed by Swap-In

The swap-out itself is not the performance problem. The pages that were swapped out were cold — nobody was using them. The system successfully accommodated the burst workload by moving unused memory to disk.

The problem arises IF those idle backends later become active and access their swapped-out memory. At that point, swap-in occurs — the backend stalls waiting for EBS to return the page. If the swapped pages remain cold indefinitely (the idle backends disconnect, or they only execute simple queries that don't touch the swapped regions), the swap-out was entirely harmless.

Concrete Scenario: Idle Backends Resume Activity (Swap-In Impact)

Continuing from the hash join burst scenario above — let's see what happens 2 hours later when the swapped-out pages are actually needed.

Setup (2 hours after the burst):

The reporting workload has finished. The 60 analytical backends have disconnected. The instance is back to its normal 30 active backends. However, 15 of the original idle backends — whose catalog caches, plan caches, and connection state were swapped out during the burst — are still connected and have been idle the entire time.

The trigger: At 2:00 PM, the application's afternoon batch cycle begins. Those 15 idle backends simultaneously receive new queries — a mix of SELECT statements against tables they haven't touched since their memory was swapped out.

What happens, step by step:

  1. Backend PID 12345 receives SELECT * FROM orders WHERE customer_id = 42
  2. The backend needs to look up the orders table in its catalog cache (pg_class, pg_attribute entries) — but those pages were swapped out 2 hours ago
  3. The CPU triggers a major page fault — the kernel must read the page from the swap device (EBS) back into RAM
  4. The backend is stalled — it cannot proceed until the swap-in I/O completes
  5. EBS swap-in latency: typically 0.5–2 ms per 4 KB page, but can spike to 5–10 ms under EBS contention
  6. The backend may need multiple swapped pages (catalog cache, plan cache, connection state) — each triggers a separate page fault
  7. First query latency: instead of the normal 2 ms, it takes 50–200 ms as dozens of pages are faulted in from swap
  8. Subsequent queries on the same backend are fast again — the pages are now back in RAM

Multiply by 15 backends resuming simultaneously:

Each of the 15 backends goes through the same page fault storm. The aggregate swap-in I/O creates a burst of EBS read requests from the swap device, competing with normal data I/O.

What this looks like in monitoring:

Timeline (Enhanced Monitoring, 1-minute granularity):

Time        swap.in    swap.out   FreeableMemory   Avg query latency
─────────   ────────   ────────   ──────────────   ─────────────────
1:58 PM     0 kB/s     0 kB/s     12 GB            2 ms
1:59 PM     0 kB/s     0 kB/s     12 GB            2 ms
2:00 PM     4,800 kB/s 0 kB/s     11.5 GB          85 ms    ← backends resume
2:01 PM     2,400 kB/s 0 kB/s     11.2 GB          45 ms    ← still faulting in
2:02 PM     200 kB/s   0 kB/s     11.0 GB          8 ms     ← most pages back in RAM
2:03 PM     0 kB/s     0 kB/s     11.0 GB          2 ms     ← normal

Key observations:

  • swap.in spikes to ~4.8 MB/sec at 2:00 PM — this is 15 backends each faulting in ~320 KB of catalog/plan cache pages
  • swap.out stays at 0 — no new pages are being evicted, there's enough RAM now
  • FreeableMemory drops slightly (12 → 11 GB) as swapped pages return to RAM, displacing some page cache
  • Query latency spikes 40× (2 ms → 85 ms) during the swap-in storm, then recovers within 2–3 minutes
  • In Performance Insights, you'd see IO:DataFileRead wait events spike at 2:00 PM — swap-in manifests as generic I/O waits because the kernel handles it transparently

Contrast — what if those backends had disconnected and reconnected instead?

If the idle backends had disconnected during the burst and reconnected at 2:00 PM, there would be no swap-in at all. Fresh backends start with empty catalog caches and plan caches — they build them from scratch by reading from shared_buffers or EBS (normal I/O path, not swap). The first-query latency would still be slightly higher than steady-state (cold cache), but it would be normal PostgreSQL cache-warming behavior, not swap-induced page fault storms.

This is why connection pooling (Amazon RDS Proxy, PgBouncer) is one of the most effective mitigations for swap-related performance issues — poolers recycle connections, preventing long-lived idle backends from accumulating cold swapped-out memory that later causes swap-in storms when reactivated.


5. Can FreeableMemory Stay Sufficient While Swap-Out Still Happens?

YES. This is absolutely possible and is one of the most common sources of confusion when investigating swap on RDS PostgreSQL instances.

Why This Happens

What FreeableMemory actually measures:

CloudWatch FreeableMemory ≈ MemFree + Buffers + Cached

Source & caveat: The AWS CloudWatch documentation for RDS describes FreeableMemory as "the amount of available random access memory." On Linux (since kernel 3.14+), this corresponds to MemAvailable from /proc/meminfo, which the kernel calculates as approximately MemFree + Buffers + Cached minus non-reclaimable portions (e.g., shared memory segments that cannot be dropped). The formula above is a commonly used approximation; the actual kernel calculation is slightly more nuanced. For practical purposes, the approximation is sufficient for understanding swap behavior.

This includes:

  • MemFree: Truly unused RAM (usually very small on a healthy system)
  • Buffers: Kernel buffer cache (small)
  • Cached: OS page cache (can be very large — often 10–30+ GB)

The page cache is counted as "freeable" because the kernel CAN drop it if needed — it's backed by files on EBS that can be re-read. So FreeableMemory of 15 GB might mean: 500 MB truly free + 14.5 GB of page cache.

Why the kernel still swaps out with high FreeableMemory:

The kernel's memory reclaim algorithm does NOT simply check "is FreeableMemory > 0?" before deciding whether to swap. Instead, it runs the LRU scanning algorithm described in Section 4, which balances between:

  • Dropping file-backed pages (shrinking the page cache)
  • Swapping out anonymous pages (moving process memory to disk)

Although the DB engine cgroup is configured to strongly prefer dropping page cache over swapping, swap-out can still occur during global memory reclaim — when the entire system (not just the DB engine cgroup) is under pressure. During global reclaim, the kernel's kswapd process reclaims pages across all cgroups, and even processes inside a cgroup configured to avoid swapping may be swapped out.

This is why FreeableMemory (which includes reclaimable page cache) can remain high while swap-out still occurs — the kernel's global reclaim decisions are not bounded by a single cgroup's configuration.

Concrete Scenario

Instance: db.r5.xlarge (32 GB RAM)

Observed metrics:

  • CloudWatch FreeableMemory = 8 GB (mostly page cache from recent sequential scans)
  • Enhanced Monitoring swap.out = 12 MB/sec (pages being written to swap)
  • Enhanced Monitoring swap.in = 0 (no pages being read back from swap)
  • Enhanced Monitoring swap.free decreasing slowly (swap space being consumed)

What's happening:

  1. The instance has 8 GB of page cache from recent large sequential scans (e.g., pg_dump, reporting queries)
  2. Several backends connected 8+ hours ago are idle — their catalog caches, plan caches, and connection overhead memory haven't been touched
  3. A global memory pressure event triggers system-wide reclaim
  4. During global reclaim, kswapd scans all cgroups. Even though the DB engine cgroup is configured to prefer page cache eviction, the global reclaim path can still swap out cold anonymous pages from the engine cgroup
  5. FreeableMemory stays high (page cache is preserved) while swap-out occurs

When This Is Benign vs Problematic

ConditionAssessment
swap.out > 0, swap.in = 0, FreeableMemory stableBenign — cold pages moved to swap, never needed again
swap.out > 0, swap.in = 0, idle backends will disconnect soonBenign — swapped memory will be freed when backends exit
swap.out > 0, swap.in > 0 intermittentlyMonitor — some swapped pages are being accessed
swap.out > 0, swap.in sustained > 0, query latency increasingProblematic — active workload hitting swapped pages

How to read each pattern in practice

Pattern 1 — Benign: swap.out only, swap.in = 0

This is the most common swap pattern on RDS PostgreSQL and is almost always harmless. You'll see this on instances with long-lived idle connections (common with application servers that maintain persistent connection pools but only use a fraction of them at any given time).

What you'd see in Enhanced Monitoring:

swap.out:  50–200 kB/s (intermittent bursts, not sustained)
swap.in:   0 kB/s (consistently zero)
swap.free: Slowly decreasing over hours/days (e.g., 3.8 GB → 3.2 GB over 24 hours)

What you'd see in CloudWatch:

SwapUsage:      Slowly increasing (e.g., 200 MB → 800 MB over 24 hours)
FreeableMemory: Stable (e.g., 10–12 GB, normal fluctuations)

What to conclude: The kernel is moving cold anonymous pages (idle backend memory) to swap. Nobody is accessing those pages. This is the kernel doing its job — freeing RAM for active workloads. No action needed.

What to do: Nothing. If you want to prevent this entirely, implement connection pooling to eliminate long-lived idle backends.

Pattern 2 — Monitor: intermittent swap.in

You'll see this when a mostly-idle backend occasionally receives a query that touches its swapped-out catalog cache or plan cache. The swap-in is brief and infrequent.

What you'd see in Enhanced Monitoring:

swap.out:  0–100 kB/s
swap.in:   0 kB/s most of the time, occasional spikes to 50–500 kB/s lasting < 1 minute
swap.free: Stable (not decreasing further)

What you'd see in CloudWatch:

SwapUsage:      Stable (e.g., 500 MB, not growing)
FreeableMemory: Stable
Query latency:  No visible impact (individual queries may be 10–50 ms slower, but not enough to show in aggregate metrics)

What to conclude: Some swapped pages are being accessed, but infrequently enough that the performance impact is negligible. The swap-in resolves quickly and doesn't recur for the same pages (they stay in RAM after being faulted back in).

What to do: Monitor. If the frequency of swap.in spikes increases, or if they start correlating with user-visible latency, escalate to Pattern 3.

Pattern 3 — Problematic: sustained swap.in with latency impact

This is the pattern that requires action. You'll see this when the workload's active memory footprint genuinely exceeds available RAM, and backends are regularly accessing pages that keep getting swapped out and back in.

What you'd see in Enhanced Monitoring:

swap.out:  500 kB/s – 5 MB/s (sustained)
swap.in:   200 kB/s – 3 MB/s (sustained, not just spikes)
swap.free: Low and decreasing (e.g., < 500 MB of swap remaining)

What you'd see in CloudWatch:

SwapUsage:      High and growing (e.g., > 50% of total swap)
FreeableMemory: Low (e.g., < 5% of total RAM)
Query latency:  Visibly elevated — p99 latency 5–50× normal

What you'd see in Performance Insights:

Top wait events: IO:DataFileRead spike (swap-in manifests as generic I/O waits)
Top SQL:         Memory-intensive queries (hash joins, large sorts, aggregations)
os.memory.db.residentSetSize: Growing or at cgroup limit

What to conclude: The instance is under genuine memory pressure. The workload's active memory footprint exceeds what RAM can hold, and the kernel is thrashing — swapping pages out to make room, then swapping them back in when they're needed again.

What to do: This requires root cause analysis (see Section 8). Common remediations: reduce work_mem, reduce max_connections, add connection pooling, optimize memory-intensive queries, or resize to a larger instance class.

Visual comparison: Benign vs Problematic swap

BENIGN PATTERN (swap.out only, swap.in = 0):
                                                                    
  swap.out  ▕ ▄  ▄▄   ▄  ▄▄▄   ▄   ▄▄  ▄   ▄▄   ▄  ▄▄   ▄  ▄▄  
  (kB/s)    ▕▄█▄▄██▄▄▄█▄▄███▄▄▄█▄▄▄██▄▄█▄▄▄██▄▄▄█▄▄██▄▄▄█▄▄██▄▄
            ▕─────────────────────────────────────────────────────── 
  swap.in   ▕                                                       
  (kB/s)    ▕________________________________________________ zero  
            ▕                                                       
  latency   ▕──────────────────────────────────────────── flat, normal
            └──────────────────────────────────────────────── time →

  Assessment: ✅ BENIGN — cold pages moving to swap, never accessed.
  Action: None.


PROBLEMATIC PATTERN (sustained swap.in + swap.out, latency impact):

  swap.out  ▕    ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
  (kB/s)    ▕▄▄▄▄████████████████████████████████████████████████████
            ▕───────────────────────────────────────────────────────
  swap.in   ▕      ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
  (kB/s)    ▕▄▄▄▄▄▄██████████████████████████████████████████████████
            ▕───────────────────────────────────────────────────────
  latency   ▕         ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
  (ms)      ▕─────────████████████████████████████████████████████████
            └──────────────────────────────────────────────────────── time →

  Assessment: ❌ PROBLEMATIC — thrashing. Pages swapped out and immediately needed.
  Action: Investigate root cause (Section 8). Likely needs resize or work_mem tuning.

Key Insight

FreeableMemory is NOT a guarantee against swap-out. It measures "memory the kernel could reclaim if it chose to drop page cache" — but the kernel's actual reclaim decisions are governed by the LRU algorithm, cgroup limits, cgroup configuration, and global reclaim behavior — not by a simple "is there free memory?" threshold.

To determine if swap is a problem, you must look at swap.in (are swapped pages being accessed?) and correlate with query latency — not just FreeableMemory.


6. Common Causes of Swap on RDS PostgreSQL

Undersized Instance Classes

The most straightforward cause: the instance's physical RAM is insufficient for the workload's memory footprint. This is common when:

  • Workload grew organically without corresponding instance upsizing
  • Instance was sized for average load but not peak load
  • Migration from on-premises where the server had more RAM

Indicators: Sustained swap usage that correlates with normal workload patterns (not just spikes). FreeableMemory consistently below 10% of total RAM.

High max_connections with Active Backends Consuming work_mem

PostgreSQL's process-per-connection model means each active backend independently allocates memory. The aggregate is unbounded:

Worst-case per-backend memory = work_mem × max_plan_nodes_per_query
Worst-case aggregate = active_backends × worst-case per-backend memory

Example: max_connections = 500, work_mem = 256 MB, 100 backends simultaneously executing 2-node hash joins:

  • 100 × 2 × 256 MB = 50 GB of work_mem alone
  • Plus shared_buffers, OS overhead, etc.

Indicators: Swap spikes correlate with connection count spikes. pg_stat_activity shows many backends in active state with hash join or sort operations.

Memory-Intensive Queries

Specific query patterns that consume disproportionate memory:

Query PatternMemory ConsumerWhy It's Expensive
Hash joins on large tableswork_mem per hash nodeHash table must fit in memory or spill to disk
Large sorts (ORDER BY without index)work_mem per sort nodeSort buffers allocated per operation
Hash aggregation (GROUP BY)work_mem for hash tableDistinct groups stored in memory
Materialization (CTEs, subqueries)work_mem for materialized rowsEntire result set buffered
Large IN lists / ANY arraysPer-backend memoryArray expansion in memory

Autovacuum Workers Consuming maintenance_work_mem

Each autovacuum worker allocates up to autovacuum_work_mem (or maintenance_work_mem if not set) for dead tuple tracking:

Autovacuum memory = autovacuum_max_workers × autovacuum_work_mem
Default: 3 × 256 MB = 768 MB (if maintenance_work_mem = 256 MB)

On instances with many tables and high write throughput, all workers may run simultaneously, each at maximum allocation. Combined with active query workload, this pushes total memory consumption over the edge.

Logical Replication Slots Consuming logical_decoding_work_mem

Each active logical replication slot allocates logical_decoding_work_mem (default 64 MB) for WAL decoding buffers. Multiple slots multiply this:

Logical replication memory = active_slots × logical_decoding_work_mem
Example: 5 slots × 64 MB = 320 MB

Additionally, if a replication slot falls behind (consumer is slow or disconnected), the WAL sender accumulates decoded changes in memory before spilling to disk, potentially consuming more than logical_decoding_work_mem temporarily.

Benign vs Problematic Swap: Decision Matrix

swap.outswap.inFreeableMemoryDurationAssessmentAction
LowZeroHighAnyBenignNone — cold pages moved to swap
LowZeroLowStableMonitorWatch for swap.in; consider upsizing
ModerateLowLowTransientLikely benignInvestigate peak workload; may self-resolve
ModerateModerateLowSustainedProblematicInvestigate root cause; remediate
HighHighNear zeroAnyCriticalImmediate action — thrashing or near-OOM
AnyAnyN/AWith OOM killsEmergencyInstance likely needs immediate resize

7. Monitoring Swap on RDS PostgreSQL

Enhanced Monitoring Swap Counters

Enhanced Monitoring provides OS-level metrics via a monitoring agent on the instance. Swap metrics are in the swap group of the Enhanced Monitoring JSON payload.

Metric NameUnitDescription
swap.totalkBTotal swap space provisioned on the instance
swap.freekBSwap space currently unused
swap.cachedkBSwap space that is also cached in RAM (pages in both swap and memory)
swap.inkB/secRate of pages read from swap into RAM (swap-in)
swap.outkB/secRate of pages written from RAM to swap (swap-out)

Example Enhanced Monitoring JSON output (swap section):

{
  "swap": {
    "total": 4194300,
    "free": 3145728,
    "cached": 524288,
    "in": 0,
    "out": 128
  }
}

Interpretation of this example:

  • 4 GB total swap provisioned
  • 3 GB free (1 GB currently used)
  • 512 MB of swap is also cached in RAM (recently swapped-in pages kept in both locations)
  • 0 kB/sec swap-in (no pages being read back — good)
  • 128 kB/sec swap-out (some pages being written to swap — investigate if sustained)

Performance Insights OS Counters

Performance Insights exposes swap and memory counters under the os.* namespace. These can be correlated with wait events and top SQL.

Swap counters:

PI Counter NameUnitDescription
os.swap.totalkBTotal swap space
os.swap.freekBFree swap space
os.swap.cachedkBSwap cached in RAM
os.swap.inkB/secSwap-in rate
os.swap.outkB/secSwap-out rate

Memory counters (relevant to swap investigation):

PI Counter NameUnitDescription
os.memory.freekBFree memory (MemFree from /proc/meminfo)
os.memory.cachedkBPage cache size
os.memory.bufferskBKernel buffer cache
os.memory.totalkBTotal physical RAM
os.memory.db.swapkBSwap used by the database process group
os.memory.db.residentSetSizekBRSS of the database process group
os.memory.outOfMemoryKillCountcountNumber of OOM kills since last restart

CloudWatch Metrics

Metric NameNamespaceUnitDescription
SwapUsageAWS/RDSBytesAmount of swap space used on the instance
FreeableMemoryAWS/RDSBytesAvailable RAM (MemFree + Buffers + Cached)

Limitation: CloudWatch SwapUsage shows total swap consumed but not the rate of swap-in/swap-out. You need Enhanced Monitoring or Performance Insights for rate metrics.

Correlating PI Swap Counters with Wait Events and Top SQL

The power of Performance Insights for swap investigation is correlation. When you observe os.swap.in > 0:

  1. Check the top wait events at the same time: Look for IO:DataFileRead or IO:BufFileRead spikes that coincide with swap-in. Swap-in manifests as generic I/O waits because the kernel handles it transparently — the PostgreSQL backend sees a page fault resolved by disk I/O.

  2. Check top SQL at the same time: Identify which queries are executing during swap-in periods. Look for:

    • Queries with high temp_blks_read / temp_blks_written (spilling to disk, memory pressure)
    • Queries with high shared_blks_read (reading from shared buffers that may have been swapped if huge pages are off)
    • Queries with long execution times that correlate with swap-in spikes
  3. Check os.memory.db.residentSetSize trend: If RSS is growing while os.memory.free is shrinking, the PostgreSQL engine is consuming more memory — likely from work_mem allocations.

Source: The os.memory.db.residentSetSize metric is defined in the Performance Insights counter metrics documentation. RSS measures physical RAM used by the process group, so an upward trend indicates increasing memory consumption by the PostgreSQL engine.

Threshold Patterns Indicating a Problem

PatternThresholdInterpretation
swap.in sustained > 0> 100 kB/sec for > 5 minutesActive workload hitting swapped pages
swap.out sustained high> 1 MB/sec for > 10 minutesSignificant memory pressure
SwapUsage > 50% of total swapAbsolute valueSwap space being consumed rapidly
FreeableMemory < 5% of total RAMPercentageVery low headroom, OOM risk
os.memory.outOfMemoryKillCount > 0Any non-zeroOOM kills have occurred — critical
swap.in correlates with latency spikeTemporal correlationSwap is directly causing performance degradation

7.1. Swap Risk Assessment: A Step-by-Step Checklist

This section provides an advisory-style assessment framework that ties the monitoring metrics from Section 7 into a structured risk evaluation. You can follow this checklist to quickly determine the severity of swap activity on an RDS PostgreSQL instance and decide on next steps.

Note: The thresholds below are guidelines, not hard rules. Every workload is different — a latency-sensitive OLTP system may need tighter thresholds than a batch analytics workload. Adapt the thresholds to your SLA and workload profile.

The Checklist

Step 1: Is swap being used at all?

What to checkWhereHow
SwapUsageCloudWatch (AWS/RDS)Check if > 0 bytes
  • SwapUsage = 0 bytes →NO RISK. No swap activity. Stop here.
  • SwapUsage > 0 bytes → Continue to Step 2.

Step 2: Is swap being actively read (swap-in)?

What to checkWhereHow
swap.inEnhanced Monitoring (swap group)Check rate in kB/sec
os.swap.inPerformance Insights (OS counters)Same metric, alternate source
  • swap.in = 0 kB/sec (consistently) → 🟢 LOW RISK. Swap is present but benign — cold pages were moved to swap and are not being accessed. This is normal on instances with long-lived idle connections.

    • Recommended action: No immediate action. Optionally, implement connection pooling to prevent idle backend memory from accumulating.
  • swap.in > 0 but < 100 kB/sec, intermittent (spikes < 1 minute) → 🟡 LOW-MEDIUM RISK. Occasional access to swapped pages. Likely an idle backend receiving an infrequent query.

    • Recommended action: Monitor. Check if swap.in frequency is increasing over days/weeks.
  • swap.in > 100 kB/sec, sustained (> 5 minutes) → Continue to Step 3.


Step 3: Is swap-in correlated with query latency?

What to checkWhereHow
Average/p99 query latencyPerformance Insights (DB load)Compare latency during swap.in periods vs baseline
IO:DataFileRead wait eventsPerformance Insights (top waits)Check for spikes coinciding with swap.in
Application-reported latencyApplication metrics / logsCross-reference with swap.in timeline
  • No latency impact (latency within normal range during swap.in) → 🟡 MEDIUM RISK. Swap-in is occurring but not yet impacting user-visible performance. The swapped pages may be from background processes (autovacuum, logical replication) rather than active query backends.

    • Recommended action: Investigate which processes are causing swap-in (check pg_stat_activity for recently-active backends). Monitor for escalation.
  • Latency increasing (p99 > 2× baseline during swap.in periods) → Continue to Step 4.


Step 4: What is the FreeableMemory trend?

What to checkWhereHow
FreeableMemoryCloudWatch (AWS/RDS)Check current value and 24-hour trend
FreeableMemory / total RAMCalculatedExpress as percentage of instance memory
  • FreeableMemory stable, > 10% of total RAM → 🟠 MEDIUM-HIGH RISK. The instance has memory headroom, but the workload's active memory footprint is causing swap pressure. This is typically a workload tuning issue — work_mem too high, too many concurrent active backends, or memory-intensive queries.

    • Recommended action: Identify the memory pressure source (Section 8). Tune work_mem, reduce max_connections, or optimize queries.
  • FreeableMemory declining, < 10% of total RAM → 🔴 HIGH RISK. The instance is running low on memory. Swap is being used as a crutch for insufficient RAM.

    • Recommended action: Identify root cause (Section 8). Likely needs instance resize or aggressive parameter tuning.
  • FreeableMemory < 2% of total RAM or approaching zero → Continue to Step 5.


Step 5: Have OOM kills occurred?

What to checkWhereHow
os.memory.outOfMemoryKillCountPerformance Insights (OS counters)Check if > 0
PostgreSQL error logRDS console → LogsSearch for "Out of memory" or "oom-killer"
Instance restart eventsRDS EventsCheck for unexpected restarts
  • No OOM kills → 🔴 HIGH RISK. The instance is under severe memory pressure but hasn't crashed yet. Swap is the only thing preventing OOM kills.

    • Recommended action: Immediate workload reduction (kill memory-intensive queries, reduce connections) + plan instance resize.
  • OOM kills detected → 🔴 CRITICAL. The instance has exhausted both RAM and swap. Processes have been killed.

    • Recommended action: Immediate instance resize. If resize requires downtime, reduce workload immediately (reduce max_connections, kill heavy queries, pause batch jobs).

Quick Reference: Risk Summary Table

Risk Levelswap.inLatency ImpactFreeableMemoryOOM KillsAction
🟢 No RiskN/AN/AN/ANoSwapUsage = 0. Nothing to do.
🟢 Low0 kB/sNoneStableNoBenign. Monitor only.
🟡 Low-Medium< 100 kB/s, intermittentNoneStableNoMonitor. Consider connection pooling.
🟡 Medium> 100 kB/s, sustainedNoneStable, > 10%NoInvestigate source. Monitor for escalation.
🟠 Medium-High> 100 kB/s, sustainedYes (< 2× baseline)Stable, > 10%NoTune work_mem, connections, or queries.
🔴 HighSustainedYes (> 2× baseline)Declining, < 10%NoRoot cause analysis. Plan resize.
🔴 CriticalSustainedSevereNear zeroYesImmediate resize or workload reduction.

8. Troubleshooting Swap on RDS PostgreSQL

Step-by-Step Diagnosis Workflow

Step 1: Confirm swap is actually occurring

Check Enhanced Monitoring or Performance Insights:

  • Is swap.out > 0? (Pages being moved to swap)
  • Is swap.in > 0? (Pages being read back — this is the performance concern)
  • What is SwapUsage / swap.total - swap.free? (Total swap consumed)

Step 2: Determine if swap is benign or problematic

  • If swap.in = 0: Swap is benign. Cold pages were moved to swap and nobody needs them. Monitor but no immediate action needed.
  • If swap.in > 0 but low and transient: Likely benign. Brief access to cold pages.
  • If swap.in sustained > 0 and correlates with query latency: Problematic. Proceed to Step 3.

Step 3: Identify the memory pressure source

Check at the time of swap activity:

  • pg_stat_activity: How many backends are active? What are they doing?
  • Performance Insights top SQL: Which queries are consuming the most resources?
  • work_mem usage: Are queries doing hash joins, sorts, or aggregations?
  • Autovacuum: Are multiple autovacuum workers running simultaneously?
  • Connections: Is the connection count unusually high?

Step 4: Identify what's being swapped

  • Check os.memory.db.residentSetSize vs os.memory.db.swap: Is the PostgreSQL engine's memory being swapped?
  • Check Enhanced Monitoring process list: Are there many idle backends with high VSZ but low RSS? (Indicates their memory was swapped out)

Step 5: Determine root cause and remediate

Based on findings from Steps 3-4, apply the appropriate remediation from the list below.

Decision Tree: Is Swap Benign or Problematic?

Decision Tree: Is Swap Benign or Problematic

Remediation Options

Instance Resizing:

  • Most direct solution for undersized instances
  • Vertical scaling: move to a larger instance class (e.g., db.r5.2xlargedb.r5.4xlarge)
  • Consider the memory-to-vCPU ratio: r class instances have more memory per vCPU than m class

Parameter Tuning:

ParameterTuning DirectionRationale
work_memDecreaseReduces per-backend memory consumption; queries spill to disk instead of consuming RAM
maintenance_work_memDecreaseReduces autovacuum worker memory; vacuum takes longer but uses less RAM
autovacuum_max_workersDecreaseFewer concurrent workers = less aggregate memory
max_connectionsDecreaseFewer potential backends = lower peak memory
logical_decoding_work_memDecreaseReduces per-slot decode buffer size
shared_buffersDecrease (rare)Only if huge pages are NOT active and shared_buffers is being swapped
huge_pagesSet to 'on'Ensures shared_buffers is pinned; eliminates largest swap candidate

Query Optimization:

  • Identify queries consuming excessive work_mem (hash joins on large tables without appropriate indexes)
  • Add indexes to convert hash joins to nested loop or merge joins
  • Break large queries into smaller batches
  • Use SET LOCAL work_mem = '64MB' for specific queries that need less memory

Connection Pooling:

  • Implement Amazon RDS Proxy or application-side connection pooling (PgBouncer, pgpool-II)
  • Reduces the number of PostgreSQL backends, directly reducing aggregate per-backend memory
  • Idle connections in the pool don't consume backend memory

Huge Pages Verification:

  • Confirm huge_pages = 'on' (not 'try') in the parameter group
  • Verify the instance class supports huge pages (most r5, r6g, m5, m6g classes do)

References

AWS Documentation

PostgreSQL Documentation

Linux Kernel Documentation