Swap in AWS RDS PostgreSQL Database Instances
Swap activity on Amazon RDS for PostgreSQL is one of the most frequently misunderstood signals during memory-related investigations. Many users see SwapUsage > 0 in CloudWatch and assume something is wrong, when in most cases the swap is benign — cold, idle backend memory moved to disk that nobody is accessing. This article helps readers distinguish between harmless swap and genuine memory pressure by explaining how Linux swap works on RDS PostgreSQL instances, what the monitoring metrics actual
1. Introduction
This article provides a comprehensive guide to understanding, monitoring, and troubleshooting swap on Amazon RDS for PostgreSQL instances. It covers Linux swap fundamentals as they apply to managed database instances, PostgreSQL's memory allocation patterns, all relevant Enhanced Monitoring and Performance Insights counters, and practical troubleshooting workflows.
Scope: This article covers Amazon RDS for PostgreSQL ONLY. Amazon Aurora PostgreSQL has a fundamentally different compute/storage architecture (distributed storage layer, no local EBS for data) that changes the swap picture significantly.
2. Swap Fundamentals on RDS PostgreSQL Instances
What Is Swap at the Linux OS Level
Linux swap space is a disk-based extension of physical RAM. When the kernel's memory reclaim algorithm determines that physical RAM is under pressure, it moves less-recently-used memory pages from RAM to the swap device on disk. This frees physical RAM for active processes. Later, when a process accesses a page that was previously moved to swap, the kernel reads it back from disk into RAM before the process can continue.
Swap is implemented as either a dedicated swap partition or a swap file on a filesystem. On RDS PostgreSQL instances, swap is provisioned as a fixed-size swap area at the platform level — the size is determined by the instance class and is not customer-adjustable.
Linux reference: For the kernel's swap subsystem internals, see
man 2 swapon,man 8 mkswap, and the kernel documentation on swap management.
How Much Swap Is Provisioned per Instance Class
The amount of swap space on an RDS PostgreSQL instance is not a fixed value — it varies by instance class and is provisioned at the infrastructure layer during instance creation. The swap size is not configurable by customers and is not exposed as a parameter group setting.
How swap size is determined:
The swap size is calculated based on the instance's total physical memory and the instance family/generation. The general relationship is:
- Burstable instances (T-class): Swap size is typically 2× the instance memory (e.g., a
db.t3.microwith 1 GB RAM gets ~2 GB swap; adb.t3.smallwith 2 GB RAM gets ~4 GB swap). This larger relative swap allocation compensates for the smaller RAM on burstable instances, providing more overflow buffer. - Standard/Memory-optimized instances (M-class, R-class): Swap size is typically a fixed fraction of instance memory, often in the range of 0.5–1× RAM for smaller sizes, tapering to a smaller fraction for very large instances. For example, a
db.r5.large(16 GB RAM) may have ~2–4 GB swap, while adb.r5.8xlarge(256 GB RAM) may have ~4–8 GB swap.
How to check the actual swap size on your instance:
You can determine the exact swap size provisioned on any running RDS PostgreSQL instance using Enhanced Monitoring:
Enhanced Monitoring → swap.total (in kilobytes)
Or via Performance Insights:
os.swap.total (in kilobytes)
For example, if swap.total = 4194300, the instance has approximately 4 GB of swap space.
Key takeaway: You cannot change the swap size on an RDS PostgreSQL instance. If the provisioned swap is insufficient for your workload's memory overflow needs, the correct remediation is to resize to a larger instance class (which provides both more RAM and more swap) or to reduce the workload's memory footprint through parameter tuning.
Why RDS PostgreSQL Instances Have Swap Space
Swap exists as a safety net against the OOM (Out-Of-Memory) Killer. Without swap, any transient spike in memory demand that exceeds physical RAM would immediately trigger the OOM Killer, terminating processes to reclaim memory. On an RDS PostgreSQL instance, an OOM kill of the PostgreSQL postgres server (main) process causes a full instance restart — all connections dropped, all in-flight transactions lost, crash recovery on startup.
Swap provides an overflow buffer that absorbs short-lived memory pressure, giving the system time to stabilize without killing processes. The trade-off: swap is backed by EBS storage, which is orders of magnitude slower than RAM. Sustained reliance on swap introduces significant latency, but it prevents catastrophic process death during transient memory spikes.
Memory Control Groups (cgroups) on RDS PostgreSQL
RDS PostgreSQL instances use Linux cgroups (control groups) to isolate and limit memory usage between the database engine and platform processes. Understanding cgroups is important for diagnosing memory and swap issues because the cgroup configuration directly affects how the kernel reclaims memory and decides whether to swap.
What are cgroups?
Cgroups are a Linux kernel feature that groups processes into hierarchical units and applies resource limits (memory, CPU, I/O) to each group independently.
Linux reference: See
man 7 cgroupsfor the general cgroups overview and the kernel documentation on memory cgroups (cgroup v1) for details.
On RDS PostgreSQL instances, the PostgreSQL engine processes ( postgres server (main) , backends, background workers) run inside a dedicated memory cgroup. This cgroup enforces a hard memory limit and is configured so that the kernel strongly prefers dropping page cache over swapping out PostgreSQL process memory. This means anonymous pages (backend memory, work_mem allocations) are only swapped when file-backed pages are nearly exhausted within the cgroup.
Why this matters for swap behavior:
The DB engine cgroup does not get 100% of the instance's physical RAM. A portion is reserved for platform processes, kernel structures, and page tables. The approximate relationship is:
DB engine memory limit ≈ total_instance_memory
- platform_and_kernel_reserve
- hugepage_reserved
DB engine memory+swap limit ≈ DB engine memory limit
+ (total_swap × 0.9)
When the PostgreSQL engine's memory usage approaches its cgroup limit, the kernel reclaims memory within the cgroup — first dropping page cache, then (reluctantly, given the cgroup's preference for page cache eviction) swapping out anonymous pages. If both memory and swap limits are exhausted, the OOM killer is invoked within the cgroup.
How this surfaces in customer-visible monitoring:
You cannot inspect cgroup files directly on RDS instances, but the effects are visible through Enhanced Monitoring and Performance Insights:
| Linux concept (under the hood) | What you see in monitoring |
|---|---|
| RSS (Resident Set Size) — anonymous pages in RAM | os.memory.db.residentSetSize in Performance Insights |
| Swap usage within the engine cgroup | os.memory.db.swap in Performance Insights |
| Page faults (major = swap-in) | swap.in rate in Enhanced Monitoring |
| Total memory pressure | FreeableMemory in CloudWatch (system-wide, not cgroup-specific) |
Linux reference: RSS, page cache, and swap are standard Linux memory concepts. See
man 5 proc(search forVmRSS) and the kernel memory cgroup documentation for how these are tracked per cgroup.
Swap-In vs Swap-Out: Definitions and Performance Implications
Swap-Out occurs when the kernel writes memory pages from RAM to the swap device. It generates write I/O to EBS. Swap-out is the kernel's response to memory pressure — it is evicting pages that haven't been accessed recently to free RAM for active workloads.
Swap-In occurs when a process accesses a memory page that was previously swapped out. The kernel must read that page back from the swap device into RAM before the process can continue. Swap-in generates read I/O latency — the PostgreSQL backend or background worker is stalled waiting for a disk read.
Performance impact hierarchy:
| Pattern | Severity | Meaning |
|---|---|---|
| Swap-out only, no swap-in | Benign | Cold pages moved to swap, never accessed again |
| Low swap-in, stable swap usage | Low concern | Occasional access to cold pages |
| Sustained swap-in | Problematic | Active workload hitting swapped pages regularly |
| High swap-in + high swap-out (thrashing) | Critical | Feedback loop — pages evicted and immediately needed again |
Relationship Between Swap and the OOM Killer
The OOM Killer is the kernel's last resort when both physical RAM and swap space are exhausted. The progression:
Linux reference: See
man 5 proc(search foroom_scoreandoom_score_adj) and the kernel documentation on OOM killer behavior for how the kernel selects which process to kill.
- Memory pressure increases → kernel shrinks page cache
- Page cache minimized → kernel starts swapping out anonymous pages
- Swap space fills up → kernel cannot satisfy allocation requests
- OOM Killer invoked → selects process with highest
oom_scoreand kills it
On RDS PostgreSQL instances:
- OOM kill of a backend process → that client connection is terminated, in-flight transaction aborted, other connections unaffected
- OOM kill of the ** postgres server (main) process** → full instance restart, all connections dropped, crash recovery
Swap delays the OOM Killer but does not prevent it when the workload's total memory footprint exceeds RAM + swap.
3. RDS PostgreSQL Memory Architecture
Instance Memory Layout
An RDS PostgreSQL instance runs on a dedicated host with a fixed amount of physical RAM shared among several major consumers:
shared_buffers with Huge Pages (Pinned) vs Without (Swappable)
shared_buffers is the single largest memory allocation on any RDS PostgreSQL instance. It defaults to approximately 25% of instance RAM (formula: {DBInstanceClassMemory/32768}).
With huge pages enabled (default on most RDS PostgreSQL instance classes):
Linux reference: See
man 5 proc(search forHugepagesize) and the kernel documentation on HugeTLB pages for how huge pages are allocated and why they are pinned (non-swappable, non-reclaimable).
- The
shared_bufferssegment is backed by Linux huge pages (2 MB pages on x86_64) - Huge pages are pinned in physical RAM by the kernel and are NEVER swapped
- The kernel cannot reclaim huge-page-backed memory under any circumstances
- This removes the largest memory consumer from the kernel's swap candidate pool entirely
- Page table overhead is reduced by ~512x (one PTE per 2 MB instead of per 4 KB)
Without huge pages (smaller instance classes or fallback):
shared_buffersis backed by regular 4 KB pages- These pages ARE swappable — the kernel can and will evict them under memory pressure
- Swapping shared buffers is catastrophic: every buffer pool access that hits a swapped page stalls on EBS I/O
- Page table overhead is significant: a 4 GB shared_buffers segment requires ~8 MB of page table entries per process that maps it
How to verify huge pages status:
-- Check the huge_pages parameter setting SHOW huge_pages; -- 'on', 'try', or 'off'
Per-Backend Memory
Each PostgreSQL backend (one per client connection) allocates private memory:
| Component | Idle | Active (complex query) | Swappable? |
|---|---|---|---|
| Process image + stack | ~5 MB | ~5 MB | Yes |
| Catalog cache | ~1–5 MB | ~5–50 MB | Yes |
| Plan cache | ~1–5 MB | ~5–20 MB | Yes |
| work_mem (per plan node) | 0 | 4 MB – 1 GB+ per node | Yes |
| temp_buffers | 0 | Up to temp_buffers setting | Yes |
| Extension memory contexts | Varies | Varies | Yes |
Critical: work_mem is allocated per plan node per query. A single query with 4 hash join nodes at work_mem = 256 MB consumes up to 1 GB. Multiply by concurrent backends and the aggregate can easily exceed available RAM.
Double Buffering with EBS
RDS PostgreSQL exhibits "double buffering" because data flows through both shared_buffers and the OS page cache:
- PostgreSQL reads a page not in
shared_buffers→ issuesread()syscall - Kernel checks page cache → if miss, reads from EBS into page cache
- Kernel copies data from page cache into the
shared_bufferssegment - Same data now exists in both
shared_buffersAND page cache
This means the effective "cache" is larger than shared_buffers alone — the page cache provides a second tier. But it also means memory is used less efficiently than it could be. The page cache portion is reclaimable (the kernel drops it under pressure), while the shared_buffers portion (with huge pages) is pinned.
For writes, dirty pages flow: shared_buffers → page cache (via write()) → EBS (via kernel writeback). The page cache acts as a write-back buffer between PostgreSQL and EBS.
4. When Pages Are Moved Out to Swap (Swap-Out Use Case)
This section explains the kernel's page reclaim algorithm and provides a concrete scenario of swap-out on an RDS PostgreSQL instance.
What Triggers the Kernel to Swap Out Pages
The kernel's memory reclaim is triggered when free memory drops below the low watermark (vm.min_free_kbytes derived). At this point, the kswapd daemon wakes up and begins scanning memory zones to reclaim pages. If kswapd cannot reclaim fast enough, direct reclaim occurs synchronously in the context of the allocating process (which stalls that process).
The reclaim algorithm must choose which pages to evict. It has two pools to draw from:
-
File-backed pages (page cache): Pages that are backed by a file on disk. Clean file-backed pages can simply be dropped (they can be re-read from EBS). Dirty file-backed pages must be flushed to EBS first, then dropped.
-
Anonymous pages: Pages that have no file backing — process heap, stack,
mmap(MAP_ANONYMOUS|MAP_PRIVATE)allocations. These pages can only be reclaimed by writing them to the swap device. This is swap-out.
Linux reference: The distinction between file-backed and anonymous pages is fundamental to the kernel's page reclaim. See the kernel documentation on page reclaim concepts and
man 2 mmapfor theMAP_ANONYMOUSflag. The active/inactive LRU lists described below are documented in the kernel source atmm/vmscan.c.
The Kernel's Page Reclaim Algorithm
The kernel maintains two LRU (Least Recently Used) lists per memory zone:
- Active list: Pages that have been accessed recently (considered "hot")
- Inactive list: Pages that haven't been accessed recently (candidates for eviction)
Pages are promoted from inactive → active when accessed. Pages are demoted from active → inactive when they age out (haven't been accessed for a while). Eviction happens from the tail of the inactive list — the coldest pages are evicted first.
Both anonymous and file-backed pages have their own active/inactive LRU lists. The kernel's reclaim algorithm decides how aggressively to scan each list based on a tunable called swappiness (see kernel documentation):
scan_ratio = anon_pages_scanned / file_pages_scanned
At high swappiness (e.g., 60, the Linux default):
The kernel scans anonymous pages at roughly 60% the rate of file-backed pages.
It prefers to drop file-backed pages but WILL swap out anonymous pages too.
At low swappiness (e.g., 0):
The kernel almost exclusively drops file-backed pages.
Anonymous pages are only swapped when file-backed pages are nearly exhausted.
On RDS PostgreSQL, the DB engine cgroup is configured with a low swappiness, meaning the kernel strongly prefers to drop page cache rather than swap out PostgreSQL process memory. However, during global reclaim (system-wide memory pressure), even processes in a low-swappiness cgroup can be swapped out.
Which Pages Get Swapped Out First
The kernel evicts pages from the tail of the inactive anonymous LRU list — these are the anonymous pages that have gone the longest without being accessed. On an RDS PostgreSQL instance, typical swap-out candidates are:
| Candidate | Why it's cold |
|---|---|
| Idle backend memory | Backend connected but not executing queries for hours |
| Cold catalog caches | Cached metadata for tables/schemas not recently queried |
| Unused extension memory contexts | Extensions that allocated memory at load time but aren't actively used |
| Old autovacuum worker memory | Worker finished but process hasn't been recycled |
| Logical replication decode buffers | Slot exists but replication is paused or slow |
Concrete Scenario: Burst of Concurrent Hash Joins
Consider an RDS PostgreSQL db.r5.2xlarge instance (64 GB RAM):
shared_buffers= 16 GB (huge pages, pinned)- OS + platform processes + kernel = ~4 GB
- Page cache = ~20 GB (caching EBS data)
- Available for backends = ~24 GB (page cache is reclaimable)
work_mem= 512 MBmax_connections= 200, typically 30 active
Normal state: 30 active backends × ~50 MB average working memory = ~1.5 GB. Plenty of headroom.
Burst event: A reporting workload kicks off. 60 backends simultaneously execute complex analytical queries with hash joins:
- Each backend allocates
work_memfor hash join nodes: 60 × 512 MB = 30 GB demand - Total memory demand now exceeds physical RAM
- Kernel's
kswapdwakes up, begins reclaiming:- First: drops clean page cache pages (EBS data that can be re-read)
- Page cache shrinks from 20 GB → 2 GB (kernel keeps a minimum)
- Still not enough — 30 GB demand + 16 GB pinned shared_buffers + 4 GB OS > 64 GB
- Kernel begins scanning the anonymous inactive LRU list
- Finds cold pages: idle backends that haven't executed queries in hours, old catalog cache entries, unused extension memory
- Swaps out those cold anonymous pages to make room for the active hash join allocations
Result: swap.out counter increases in Enhanced Monitoring. The swapped pages belong to idle backends and cold memory regions.
The Impact: Swap-Out Is Only a Problem When Followed by Swap-In
The swap-out itself is not the performance problem. The pages that were swapped out were cold — nobody was using them. The system successfully accommodated the burst workload by moving unused memory to disk.
The problem arises IF those idle backends later become active and access their swapped-out memory. At that point, swap-in occurs — the backend stalls waiting for EBS to return the page. If the swapped pages remain cold indefinitely (the idle backends disconnect, or they only execute simple queries that don't touch the swapped regions), the swap-out was entirely harmless.
Concrete Scenario: Idle Backends Resume Activity (Swap-In Impact)
Continuing from the hash join burst scenario above — let's see what happens 2 hours later when the swapped-out pages are actually needed.
Setup (2 hours after the burst):
The reporting workload has finished. The 60 analytical backends have disconnected. The instance is back to its normal 30 active backends. However, 15 of the original idle backends — whose catalog caches, plan caches, and connection state were swapped out during the burst — are still connected and have been idle the entire time.
The trigger: At 2:00 PM, the application's afternoon batch cycle begins. Those 15 idle backends simultaneously receive new queries — a mix of SELECT statements against tables they haven't touched since their memory was swapped out.
What happens, step by step:
- Backend PID 12345 receives
SELECT * FROM orders WHERE customer_id = 42 - The backend needs to look up the
orderstable in its catalog cache (pg_class, pg_attribute entries) — but those pages were swapped out 2 hours ago - The CPU triggers a major page fault — the kernel must read the page from the swap device (EBS) back into RAM
- The backend is stalled — it cannot proceed until the swap-in I/O completes
- EBS swap-in latency: typically 0.5–2 ms per 4 KB page, but can spike to 5–10 ms under EBS contention
- The backend may need multiple swapped pages (catalog cache, plan cache, connection state) — each triggers a separate page fault
- First query latency: instead of the normal 2 ms, it takes 50–200 ms as dozens of pages are faulted in from swap
- Subsequent queries on the same backend are fast again — the pages are now back in RAM
Multiply by 15 backends resuming simultaneously:
Each of the 15 backends goes through the same page fault storm. The aggregate swap-in I/O creates a burst of EBS read requests from the swap device, competing with normal data I/O.
What this looks like in monitoring:
Timeline (Enhanced Monitoring, 1-minute granularity):
Time swap.in swap.out FreeableMemory Avg query latency
───────── ──────── ──────── ────────────── ─────────────────
1:58 PM 0 kB/s 0 kB/s 12 GB 2 ms
1:59 PM 0 kB/s 0 kB/s 12 GB 2 ms
2:00 PM 4,800 kB/s 0 kB/s 11.5 GB 85 ms ← backends resume
2:01 PM 2,400 kB/s 0 kB/s 11.2 GB 45 ms ← still faulting in
2:02 PM 200 kB/s 0 kB/s 11.0 GB 8 ms ← most pages back in RAM
2:03 PM 0 kB/s 0 kB/s 11.0 GB 2 ms ← normal
Key observations:
- swap.in spikes to ~4.8 MB/sec at 2:00 PM — this is 15 backends each faulting in ~320 KB of catalog/plan cache pages
- swap.out stays at 0 — no new pages are being evicted, there's enough RAM now
- FreeableMemory drops slightly (12 → 11 GB) as swapped pages return to RAM, displacing some page cache
- Query latency spikes 40× (2 ms → 85 ms) during the swap-in storm, then recovers within 2–3 minutes
- In Performance Insights, you'd see
IO:DataFileReadwait events spike at 2:00 PM — swap-in manifests as generic I/O waits because the kernel handles it transparently
Contrast — what if those backends had disconnected and reconnected instead?
If the idle backends had disconnected during the burst and reconnected at 2:00 PM, there would be no swap-in at all. Fresh backends start with empty catalog caches and plan caches — they build them from scratch by reading from shared_buffers or EBS (normal I/O path, not swap). The first-query latency would still be slightly higher than steady-state (cold cache), but it would be normal PostgreSQL cache-warming behavior, not swap-induced page fault storms.
This is why connection pooling (Amazon RDS Proxy, PgBouncer) is one of the most effective mitigations for swap-related performance issues — poolers recycle connections, preventing long-lived idle backends from accumulating cold swapped-out memory that later causes swap-in storms when reactivated.
5. Can FreeableMemory Stay Sufficient While Swap-Out Still Happens?
YES. This is absolutely possible and is one of the most common sources of confusion when investigating swap on RDS PostgreSQL instances.
Why This Happens
What FreeableMemory actually measures:
CloudWatch FreeableMemory ≈ MemFree + Buffers + Cached
Source & caveat: The AWS CloudWatch documentation for RDS describes
FreeableMemoryas "the amount of available random access memory." On Linux (since kernel 3.14+), this corresponds toMemAvailablefrom/proc/meminfo, which the kernel calculates as approximatelyMemFree + Buffers + Cachedminus non-reclaimable portions (e.g., shared memory segments that cannot be dropped). The formula above is a commonly used approximation; the actual kernel calculation is slightly more nuanced. For practical purposes, the approximation is sufficient for understanding swap behavior.
This includes:
MemFree: Truly unused RAM (usually very small on a healthy system)Buffers: Kernel buffer cache (small)Cached: OS page cache (can be very large — often 10–30+ GB)
The page cache is counted as "freeable" because the kernel CAN drop it if needed — it's backed by files on EBS that can be re-read. So FreeableMemory of 15 GB might mean: 500 MB truly free + 14.5 GB of page cache.
Why the kernel still swaps out with high FreeableMemory:
The kernel's memory reclaim algorithm does NOT simply check "is FreeableMemory > 0?" before deciding whether to swap. Instead, it runs the LRU scanning algorithm described in Section 4, which balances between:
- Dropping file-backed pages (shrinking the page cache)
- Swapping out anonymous pages (moving process memory to disk)
Although the DB engine cgroup is configured to strongly prefer dropping page cache over swapping, swap-out can still occur during global memory reclaim — when the entire system (not just the DB engine cgroup) is under pressure. During global reclaim, the kernel's kswapd process reclaims pages across all cgroups, and even processes inside a cgroup configured to avoid swapping may be swapped out.
This is why FreeableMemory (which includes reclaimable page cache) can remain high while swap-out still occurs — the kernel's global reclaim decisions are not bounded by a single cgroup's configuration.
Concrete Scenario
Instance: db.r5.xlarge (32 GB RAM)
Observed metrics:
- CloudWatch
FreeableMemory= 8 GB (mostly page cache from recent sequential scans) - Enhanced Monitoring
swap.out= 12 MB/sec (pages being written to swap) - Enhanced Monitoring
swap.in= 0 (no pages being read back from swap) - Enhanced Monitoring
swap.freedecreasing slowly (swap space being consumed)
What's happening:
- The instance has 8 GB of page cache from recent large sequential scans (e.g.,
pg_dump, reporting queries) - Several backends connected 8+ hours ago are idle — their catalog caches, plan caches, and connection overhead memory haven't been touched
- A global memory pressure event triggers system-wide reclaim
- During global reclaim, kswapd scans all cgroups. Even though the DB engine cgroup is configured to prefer page cache eviction, the global reclaim path can still swap out cold anonymous pages from the engine cgroup
- FreeableMemory stays high (page cache is preserved) while swap-out occurs
When This Is Benign vs Problematic
| Condition | Assessment |
|---|---|
swap.out > 0, swap.in = 0, FreeableMemory stable | Benign — cold pages moved to swap, never needed again |
swap.out > 0, swap.in = 0, idle backends will disconnect soon | Benign — swapped memory will be freed when backends exit |
swap.out > 0, swap.in > 0 intermittently | Monitor — some swapped pages are being accessed |
swap.out > 0, swap.in sustained > 0, query latency increasing | Problematic — active workload hitting swapped pages |
How to read each pattern in practice
Pattern 1 — Benign: swap.out only, swap.in = 0
This is the most common swap pattern on RDS PostgreSQL and is almost always harmless. You'll see this on instances with long-lived idle connections (common with application servers that maintain persistent connection pools but only use a fraction of them at any given time).
What you'd see in Enhanced Monitoring:
swap.out: 50–200 kB/s (intermittent bursts, not sustained)
swap.in: 0 kB/s (consistently zero)
swap.free: Slowly decreasing over hours/days (e.g., 3.8 GB → 3.2 GB over 24 hours)
What you'd see in CloudWatch:
SwapUsage: Slowly increasing (e.g., 200 MB → 800 MB over 24 hours)
FreeableMemory: Stable (e.g., 10–12 GB, normal fluctuations)
What to conclude: The kernel is moving cold anonymous pages (idle backend memory) to swap. Nobody is accessing those pages. This is the kernel doing its job — freeing RAM for active workloads. No action needed.
What to do: Nothing. If you want to prevent this entirely, implement connection pooling to eliminate long-lived idle backends.
Pattern 2 — Monitor: intermittent swap.in
You'll see this when a mostly-idle backend occasionally receives a query that touches its swapped-out catalog cache or plan cache. The swap-in is brief and infrequent.
What you'd see in Enhanced Monitoring:
swap.out: 0–100 kB/s
swap.in: 0 kB/s most of the time, occasional spikes to 50–500 kB/s lasting < 1 minute
swap.free: Stable (not decreasing further)
What you'd see in CloudWatch:
SwapUsage: Stable (e.g., 500 MB, not growing)
FreeableMemory: Stable
Query latency: No visible impact (individual queries may be 10–50 ms slower, but not enough to show in aggregate metrics)
What to conclude: Some swapped pages are being accessed, but infrequently enough that the performance impact is negligible. The swap-in resolves quickly and doesn't recur for the same pages (they stay in RAM after being faulted back in).
What to do: Monitor. If the frequency of swap.in spikes increases, or if they start correlating with user-visible latency, escalate to Pattern 3.
Pattern 3 — Problematic: sustained swap.in with latency impact
This is the pattern that requires action. You'll see this when the workload's active memory footprint genuinely exceeds available RAM, and backends are regularly accessing pages that keep getting swapped out and back in.
What you'd see in Enhanced Monitoring:
swap.out: 500 kB/s – 5 MB/s (sustained)
swap.in: 200 kB/s – 3 MB/s (sustained, not just spikes)
swap.free: Low and decreasing (e.g., < 500 MB of swap remaining)
What you'd see in CloudWatch:
SwapUsage: High and growing (e.g., > 50% of total swap)
FreeableMemory: Low (e.g., < 5% of total RAM)
Query latency: Visibly elevated — p99 latency 5–50× normal
What you'd see in Performance Insights:
Top wait events: IO:DataFileRead spike (swap-in manifests as generic I/O waits)
Top SQL: Memory-intensive queries (hash joins, large sorts, aggregations)
os.memory.db.residentSetSize: Growing or at cgroup limit
What to conclude: The instance is under genuine memory pressure. The workload's active memory footprint exceeds what RAM can hold, and the kernel is thrashing — swapping pages out to make room, then swapping them back in when they're needed again.
What to do: This requires root cause analysis (see Section 8). Common remediations: reduce work_mem, reduce max_connections, add connection pooling, optimize memory-intensive queries, or resize to a larger instance class.
Visual comparison: Benign vs Problematic swap
BENIGN PATTERN (swap.out only, swap.in = 0):
swap.out ▕ ▄ ▄▄ ▄ ▄▄▄ ▄ ▄▄ ▄ ▄▄ ▄ ▄▄ ▄ ▄▄
(kB/s) ▕▄█▄▄██▄▄▄█▄▄███▄▄▄█▄▄▄██▄▄█▄▄▄██▄▄▄█▄▄██▄▄▄█▄▄██▄▄
▕───────────────────────────────────────────────────────
swap.in ▕
(kB/s) ▕________________________________________________ zero
▕
latency ▕──────────────────────────────────────────── flat, normal
└──────────────────────────────────────────────── time →
Assessment: ✅ BENIGN — cold pages moving to swap, never accessed.
Action: None.
PROBLEMATIC PATTERN (sustained swap.in + swap.out, latency impact):
swap.out ▕ ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
(kB/s) ▕▄▄▄▄████████████████████████████████████████████████████
▕───────────────────────────────────────────────────────
swap.in ▕ ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
(kB/s) ▕▄▄▄▄▄▄██████████████████████████████████████████████████
▕───────────────────────────────────────────────────────
latency ▕ ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
(ms) ▕─────────████████████████████████████████████████████████
└──────────────────────────────────────────────────────── time →
Assessment: ❌ PROBLEMATIC — thrashing. Pages swapped out and immediately needed.
Action: Investigate root cause (Section 8). Likely needs resize or work_mem tuning.
Key Insight
FreeableMemory is NOT a guarantee against swap-out. It measures "memory the kernel could reclaim if it chose to drop page cache" — but the kernel's actual reclaim decisions are governed by the LRU algorithm, cgroup limits, cgroup configuration, and global reclaim behavior — not by a simple "is there free memory?" threshold.
To determine if swap is a problem, you must look at swap.in (are swapped pages being accessed?) and correlate with query latency — not just FreeableMemory.
6. Common Causes of Swap on RDS PostgreSQL
Undersized Instance Classes
The most straightforward cause: the instance's physical RAM is insufficient for the workload's memory footprint. This is common when:
- Workload grew organically without corresponding instance upsizing
- Instance was sized for average load but not peak load
- Migration from on-premises where the server had more RAM
Indicators: Sustained swap usage that correlates with normal workload patterns (not just spikes). FreeableMemory consistently below 10% of total RAM.
High max_connections with Active Backends Consuming work_mem
PostgreSQL's process-per-connection model means each active backend independently allocates memory. The aggregate is unbounded:
Worst-case per-backend memory = work_mem × max_plan_nodes_per_query
Worst-case aggregate = active_backends × worst-case per-backend memory
Example: max_connections = 500, work_mem = 256 MB, 100 backends simultaneously executing 2-node hash joins:
- 100 × 2 × 256 MB = 50 GB of work_mem alone
- Plus shared_buffers, OS overhead, etc.
Indicators: Swap spikes correlate with connection count spikes. pg_stat_activity shows many backends in active state with hash join or sort operations.
Memory-Intensive Queries
Specific query patterns that consume disproportionate memory:
| Query Pattern | Memory Consumer | Why It's Expensive |
|---|---|---|
| Hash joins on large tables | work_mem per hash node | Hash table must fit in memory or spill to disk |
| Large sorts (ORDER BY without index) | work_mem per sort node | Sort buffers allocated per operation |
| Hash aggregation (GROUP BY) | work_mem for hash table | Distinct groups stored in memory |
| Materialization (CTEs, subqueries) | work_mem for materialized rows | Entire result set buffered |
| Large IN lists / ANY arrays | Per-backend memory | Array expansion in memory |
Autovacuum Workers Consuming maintenance_work_mem
Each autovacuum worker allocates up to autovacuum_work_mem (or maintenance_work_mem if not set) for dead tuple tracking:
Autovacuum memory = autovacuum_max_workers × autovacuum_work_mem
Default: 3 × 256 MB = 768 MB (if maintenance_work_mem = 256 MB)
On instances with many tables and high write throughput, all workers may run simultaneously, each at maximum allocation. Combined with active query workload, this pushes total memory consumption over the edge.
Logical Replication Slots Consuming logical_decoding_work_mem
Each active logical replication slot allocates logical_decoding_work_mem (default 64 MB) for WAL decoding buffers. Multiple slots multiply this:
Logical replication memory = active_slots × logical_decoding_work_mem
Example: 5 slots × 64 MB = 320 MB
Additionally, if a replication slot falls behind (consumer is slow or disconnected), the WAL sender accumulates decoded changes in memory before spilling to disk, potentially consuming more than logical_decoding_work_mem temporarily.
Benign vs Problematic Swap: Decision Matrix
| swap.out | swap.in | FreeableMemory | Duration | Assessment | Action |
|---|---|---|---|---|---|
| Low | Zero | High | Any | Benign | None — cold pages moved to swap |
| Low | Zero | Low | Stable | Monitor | Watch for swap.in; consider upsizing |
| Moderate | Low | Low | Transient | Likely benign | Investigate peak workload; may self-resolve |
| Moderate | Moderate | Low | Sustained | Problematic | Investigate root cause; remediate |
| High | High | Near zero | Any | Critical | Immediate action — thrashing or near-OOM |
| Any | Any | N/A | With OOM kills | Emergency | Instance likely needs immediate resize |
7. Monitoring Swap on RDS PostgreSQL
Enhanced Monitoring Swap Counters
Enhanced Monitoring provides OS-level metrics via a monitoring agent on the instance. Swap metrics are in the swap group of the Enhanced Monitoring JSON payload.
| Metric Name | Unit | Description |
|---|---|---|
swap.total | kB | Total swap space provisioned on the instance |
swap.free | kB | Swap space currently unused |
swap.cached | kB | Swap space that is also cached in RAM (pages in both swap and memory) |
swap.in | kB/sec | Rate of pages read from swap into RAM (swap-in) |
swap.out | kB/sec | Rate of pages written from RAM to swap (swap-out) |
Example Enhanced Monitoring JSON output (swap section):
{ "swap": { "total": 4194300, "free": 3145728, "cached": 524288, "in": 0, "out": 128 } }
Interpretation of this example:
- 4 GB total swap provisioned
- 3 GB free (1 GB currently used)
- 512 MB of swap is also cached in RAM (recently swapped-in pages kept in both locations)
- 0 kB/sec swap-in (no pages being read back — good)
- 128 kB/sec swap-out (some pages being written to swap — investigate if sustained)
Performance Insights OS Counters
Performance Insights exposes swap and memory counters under the os.* namespace. These can be correlated with wait events and top SQL.
Swap counters:
| PI Counter Name | Unit | Description |
|---|---|---|
os.swap.total | kB | Total swap space |
os.swap.free | kB | Free swap space |
os.swap.cached | kB | Swap cached in RAM |
os.swap.in | kB/sec | Swap-in rate |
os.swap.out | kB/sec | Swap-out rate |
Memory counters (relevant to swap investigation):
| PI Counter Name | Unit | Description |
|---|---|---|
os.memory.free | kB | Free memory (MemFree from /proc/meminfo) |
os.memory.cached | kB | Page cache size |
os.memory.buffers | kB | Kernel buffer cache |
os.memory.total | kB | Total physical RAM |
os.memory.db.swap | kB | Swap used by the database process group |
os.memory.db.residentSetSize | kB | RSS of the database process group |
os.memory.outOfMemoryKillCount | count | Number of OOM kills since last restart |
CloudWatch Metrics
| Metric Name | Namespace | Unit | Description |
|---|---|---|---|
SwapUsage | AWS/RDS | Bytes | Amount of swap space used on the instance |
FreeableMemory | AWS/RDS | Bytes | Available RAM (MemFree + Buffers + Cached) |
Limitation: CloudWatch SwapUsage shows total swap consumed but not the rate of swap-in/swap-out. You need Enhanced Monitoring or Performance Insights for rate metrics.
Correlating PI Swap Counters with Wait Events and Top SQL
The power of Performance Insights for swap investigation is correlation. When you observe os.swap.in > 0:
-
Check the top wait events at the same time: Look for
IO:DataFileReadorIO:BufFileReadspikes that coincide with swap-in. Swap-in manifests as generic I/O waits because the kernel handles it transparently — the PostgreSQL backend sees a page fault resolved by disk I/O. -
Check top SQL at the same time: Identify which queries are executing during swap-in periods. Look for:
- Queries with high
temp_blks_read/temp_blks_written(spilling to disk, memory pressure) - Queries with high
shared_blks_read(reading from shared buffers that may have been swapped if huge pages are off) - Queries with long execution times that correlate with swap-in spikes
- Queries with high
-
Check
os.memory.db.residentSetSizetrend: If RSS is growing whileos.memory.freeis shrinking, the PostgreSQL engine is consuming more memory — likely from work_mem allocations.
Source: The
os.memory.db.residentSetSizemetric is defined in the Performance Insights counter metrics documentation. RSS measures physical RAM used by the process group, so an upward trend indicates increasing memory consumption by the PostgreSQL engine.
Threshold Patterns Indicating a Problem
| Pattern | Threshold | Interpretation |
|---|---|---|
swap.in sustained > 0 | > 100 kB/sec for > 5 minutes | Active workload hitting swapped pages |
swap.out sustained high | > 1 MB/sec for > 10 minutes | Significant memory pressure |
SwapUsage > 50% of total swap | Absolute value | Swap space being consumed rapidly |
FreeableMemory < 5% of total RAM | Percentage | Very low headroom, OOM risk |
os.memory.outOfMemoryKillCount > 0 | Any non-zero | OOM kills have occurred — critical |
swap.in correlates with latency spike | Temporal correlation | Swap is directly causing performance degradation |
7.1. Swap Risk Assessment: A Step-by-Step Checklist
This section provides an advisory-style assessment framework that ties the monitoring metrics from Section 7 into a structured risk evaluation. You can follow this checklist to quickly determine the severity of swap activity on an RDS PostgreSQL instance and decide on next steps.
Note: The thresholds below are guidelines, not hard rules. Every workload is different — a latency-sensitive OLTP system may need tighter thresholds than a batch analytics workload. Adapt the thresholds to your SLA and workload profile.
The Checklist
Step 1: Is swap being used at all?
| What to check | Where | How |
|---|---|---|
SwapUsage | CloudWatch (AWS/RDS) | Check if > 0 bytes |
- SwapUsage = 0 bytes → ✅ NO RISK. No swap activity. Stop here.
- SwapUsage > 0 bytes → Continue to Step 2.
Step 2: Is swap being actively read (swap-in)?
| What to check | Where | How |
|---|---|---|
swap.in | Enhanced Monitoring (swap group) | Check rate in kB/sec |
os.swap.in | Performance Insights (OS counters) | Same metric, alternate source |
-
swap.in = 0 kB/sec (consistently) → 🟢 LOW RISK. Swap is present but benign — cold pages were moved to swap and are not being accessed. This is normal on instances with long-lived idle connections.
- Recommended action: No immediate action. Optionally, implement connection pooling to prevent idle backend memory from accumulating.
-
swap.in > 0 but < 100 kB/sec, intermittent (spikes < 1 minute) → 🟡 LOW-MEDIUM RISK. Occasional access to swapped pages. Likely an idle backend receiving an infrequent query.
- Recommended action: Monitor. Check if swap.in frequency is increasing over days/weeks.
-
swap.in > 100 kB/sec, sustained (> 5 minutes) → Continue to Step 3.
Step 3: Is swap-in correlated with query latency?
| What to check | Where | How |
|---|---|---|
| Average/p99 query latency | Performance Insights (DB load) | Compare latency during swap.in periods vs baseline |
IO:DataFileRead wait events | Performance Insights (top waits) | Check for spikes coinciding with swap.in |
| Application-reported latency | Application metrics / logs | Cross-reference with swap.in timeline |
-
No latency impact (latency within normal range during swap.in) → 🟡 MEDIUM RISK. Swap-in is occurring but not yet impacting user-visible performance. The swapped pages may be from background processes (autovacuum, logical replication) rather than active query backends.
- Recommended action: Investigate which processes are causing swap-in (check
pg_stat_activityfor recently-active backends). Monitor for escalation.
- Recommended action: Investigate which processes are causing swap-in (check
-
Latency increasing (p99 > 2× baseline during swap.in periods) → Continue to Step 4.
Step 4: What is the FreeableMemory trend?
| What to check | Where | How |
|---|---|---|
FreeableMemory | CloudWatch (AWS/RDS) | Check current value and 24-hour trend |
FreeableMemory / total RAM | Calculated | Express as percentage of instance memory |
-
FreeableMemory stable, > 10% of total RAM → 🟠 MEDIUM-HIGH RISK. The instance has memory headroom, but the workload's active memory footprint is causing swap pressure. This is typically a workload tuning issue —
work_memtoo high, too many concurrent active backends, or memory-intensive queries.- Recommended action: Identify the memory pressure source (Section 8). Tune
work_mem, reducemax_connections, or optimize queries.
- Recommended action: Identify the memory pressure source (Section 8). Tune
-
FreeableMemory declining, < 10% of total RAM → 🔴 HIGH RISK. The instance is running low on memory. Swap is being used as a crutch for insufficient RAM.
- Recommended action: Identify root cause (Section 8). Likely needs instance resize or aggressive parameter tuning.
-
FreeableMemory < 2% of total RAM or approaching zero → Continue to Step 5.
Step 5: Have OOM kills occurred?
| What to check | Where | How |
|---|---|---|
os.memory.outOfMemoryKillCount | Performance Insights (OS counters) | Check if > 0 |
| PostgreSQL error log | RDS console → Logs | Search for "Out of memory" or "oom-killer" |
| Instance restart events | RDS Events | Check for unexpected restarts |
-
No OOM kills → 🔴 HIGH RISK. The instance is under severe memory pressure but hasn't crashed yet. Swap is the only thing preventing OOM kills.
- Recommended action: Immediate workload reduction (kill memory-intensive queries, reduce connections) + plan instance resize.
-
OOM kills detected → 🔴 CRITICAL. The instance has exhausted both RAM and swap. Processes have been killed.
- Recommended action: Immediate instance resize. If resize requires downtime, reduce workload immediately (reduce
max_connections, kill heavy queries, pause batch jobs).
- Recommended action: Immediate instance resize. If resize requires downtime, reduce workload immediately (reduce
Quick Reference: Risk Summary Table
| Risk Level | swap.in | Latency Impact | FreeableMemory | OOM Kills | Action |
|---|---|---|---|---|---|
| 🟢 No Risk | N/A | N/A | N/A | No | SwapUsage = 0. Nothing to do. |
| 🟢 Low | 0 kB/s | None | Stable | No | Benign. Monitor only. |
| 🟡 Low-Medium | < 100 kB/s, intermittent | None | Stable | No | Monitor. Consider connection pooling. |
| 🟡 Medium | > 100 kB/s, sustained | None | Stable, > 10% | No | Investigate source. Monitor for escalation. |
| 🟠 Medium-High | > 100 kB/s, sustained | Yes (< 2× baseline) | Stable, > 10% | No | Tune work_mem, connections, or queries. |
| 🔴 High | Sustained | Yes (> 2× baseline) | Declining, < 10% | No | Root cause analysis. Plan resize. |
| 🔴 Critical | Sustained | Severe | Near zero | Yes | Immediate resize or workload reduction. |
8. Troubleshooting Swap on RDS PostgreSQL
Step-by-Step Diagnosis Workflow
Step 1: Confirm swap is actually occurring
Check Enhanced Monitoring or Performance Insights:
- Is
swap.out> 0? (Pages being moved to swap) - Is
swap.in> 0? (Pages being read back — this is the performance concern) - What is
SwapUsage/swap.total - swap.free? (Total swap consumed)
Step 2: Determine if swap is benign or problematic
- If
swap.in= 0: Swap is benign. Cold pages were moved to swap and nobody needs them. Monitor but no immediate action needed. - If
swap.in> 0 but low and transient: Likely benign. Brief access to cold pages. - If
swap.insustained > 0 and correlates with query latency: Problematic. Proceed to Step 3.
Step 3: Identify the memory pressure source
Check at the time of swap activity:
pg_stat_activity: How many backends are active? What are they doing?- Performance Insights top SQL: Which queries are consuming the most resources?
work_memusage: Are queries doing hash joins, sorts, or aggregations?- Autovacuum: Are multiple autovacuum workers running simultaneously?
- Connections: Is the connection count unusually high?
Step 4: Identify what's being swapped
- Check
os.memory.db.residentSetSizevsos.memory.db.swap: Is the PostgreSQL engine's memory being swapped? - Check Enhanced Monitoring process list: Are there many idle backends with high VSZ but low RSS? (Indicates their memory was swapped out)
Step 5: Determine root cause and remediate
Based on findings from Steps 3-4, apply the appropriate remediation from the list below.
Decision Tree: Is Swap Benign or Problematic?
Remediation Options
Instance Resizing:
- Most direct solution for undersized instances
- Vertical scaling: move to a larger instance class (e.g.,
db.r5.2xlarge→db.r5.4xlarge) - Consider the memory-to-vCPU ratio:
rclass instances have more memory per vCPU thanmclass
Parameter Tuning:
| Parameter | Tuning Direction | Rationale |
|---|---|---|
work_mem | Decrease | Reduces per-backend memory consumption; queries spill to disk instead of consuming RAM |
maintenance_work_mem | Decrease | Reduces autovacuum worker memory; vacuum takes longer but uses less RAM |
autovacuum_max_workers | Decrease | Fewer concurrent workers = less aggregate memory |
max_connections | Decrease | Fewer potential backends = lower peak memory |
logical_decoding_work_mem | Decrease | Reduces per-slot decode buffer size |
shared_buffers | Decrease (rare) | Only if huge pages are NOT active and shared_buffers is being swapped |
huge_pages | Set to 'on' | Ensures shared_buffers is pinned; eliminates largest swap candidate |
Query Optimization:
- Identify queries consuming excessive work_mem (hash joins on large tables without appropriate indexes)
- Add indexes to convert hash joins to nested loop or merge joins
- Break large queries into smaller batches
- Use
SET LOCAL work_mem = '64MB'for specific queries that need less memory
Connection Pooling:
- Implement Amazon RDS Proxy or application-side connection pooling (PgBouncer, pgpool-II)
- Reduces the number of PostgreSQL backends, directly reducing aggregate per-backend memory
- Idle connections in the pool don't consume backend memory
Huge Pages Verification:
- Confirm
huge_pages = 'on'(not 'try') in the parameter group - Verify the instance class supports huge pages (most
r5,r6g,m5,m6gclasses do)
References
AWS Documentation
- Amazon RDS Enhanced Monitoring — OS metric definitions and JSON schema
- Amazon RDS Performance Insights — PI counter definitions and usage
- Performance Insights Counter Metrics — Detailed counter definitions including
os.memory.db.residentSetSize - Amazon RDS for PostgreSQL Best Practices — General best practices including memory sizing
- Amazon CloudWatch Metrics for RDS — CloudWatch metric definitions including SwapUsage and FreeableMemory
- Amazon RDS Proxy — Connection pooling for RDS
PostgreSQL Documentation
- PostgreSQL Resource Consumption Parameters — shared_buffers, work_mem, maintenance_work_mem, huge_pages
- PostgreSQL Memory Contexts — How PostgreSQL manages memory internally
- pg_stat_activity — Monitoring active backends and their state
Linux Kernel Documentation
- Linux VM Subsystem — Overcommit and OOM — Kernel memory overcommit and OOM killer behavior
- Linux Page Reclaim — LRU lists, page reclaim algorithm, zone watermarks
- Memory Cgroups (cgroup v1) — memory.stat fields, memory limits, cgroup configuration
- HugeTLB Pages — Huge page allocation and behavior
man 7 cgroups— Control groups overviewman 2 mmap— Memory mapping, MAP_ANONYMOUS flagman 5 proc— /proc filesystem, VmRSS, oom_score
- Language
- English
Relevant content
asked 8 months ago
