- Newest
- Most votes
- Most comments
While the CloudWatch metrics mentioned in the primary answer are great for validation, there is a strong correlation in your description that points to a specific Windows behavior: Memory Exhaustion leading to Pagefile thrashing.
1. The Root Cause: RAM vs. Disk
You mentioned 80% memory utilization (~13 GB/16 GB). On a Windows Jump Host with multiple RDP sessions, Windows often hits a "soft limit" where it starts aggressively moving memory pages to the pagefile.sys on your C: drive to keep the system responsive.
- The Culprits: The
MsMpEng.exe(Defender) andAppXSVC(Updates) processes you observed are likely scanning those memory pages as they are swapped to disk. - The Symptom: Your "High Read Latency" alert is likely the result of the Windows Kernel hammering the disk to compensate for the lack of physical RAM.
2. How to validate (OS-Level)
In Windows Resource Monitor (resmon.exe), go to the Disk tab and sort by Response Time (ms):
- If you see the System process or
pagefile.syswith high response times (100ms+), your issue is memory-driven, not an EBS performance bottleneck. - Check the Memory tab for "Hard Faults/sec". A high number of hard faults indicates that the instance is constantly hitting the disk to retrieve data that should be in RAM, which kills EBS performance.
3. Remediation without downtime
Since this is a shared Jump Host, try these non-disruptive steps:
- Exclude the Pagefile: Add an exclusion for the C:\pagefile.sys file in Windows Defender settings. This prevents Defender from scanning the swap file, which is a standard AWS/Windows best practice to reduce I/O overhead.
- Check GP3 Throughput: You have 3000 IOPS, but check the Throughput (MiB/s) in CloudWatch. GP3 defaults to 125 MiB/s. If your paging activity exceeds this, you can increase the throughput limit in the AWS Console without any downtime.
- Right-sizing: If the load is consistent, your r5a.large (16 GB) is simply at its limit. Changing the instance type to an r5a.xlarge (32 GB) or even a newer generation like r6a.xlarge or r7a.xlarge requires only a quick stop/start (minutes of downtime) and is usually the most cost-effective fix for "ghost" latency issue
Summary: Your "Disk Latency" is likely a "Memory Problem" in disguise. Relieve the RAM pressure, and the EBS alerts will likely disappear.
1. How to validate high EBS read latency
CloudWatch Metrics:
You can monitor the average latency for read operations using the VolumeAvgReadLatency metric in CloudWatch. This metric shows the average time taken to complete read operations in a minute. The average is calculated based on I/O operations that completed in the last minute; if no operations completed, the value will be zero.
Additionally, monitor these complementary metrics:
VolumeAvgIOPSandVolumeAvgThroughputto compare actual performance against your provisioned 3000 IOPSVolumeIOPSExceededCheckandVolumeThroughputExceededCheckto determine if your workload consistently attempted to drive IOPS or throughput greater than provisioned limits in any given minute (returns 1 if exceeded, 0 if within limits)VolumeStalledIOCheckto monitor overall volume status
OS-Level Metrics: For higher resolution metrics beyond CloudWatch's per-minute granularity, you can access Amazon EBS detailed performance statistics directly from the instance using the nvme block device. This provides per-second latency values and helps track latency outliers. You can access these statistics by running the ebsnvme script from the instance CLI, which will show detailed I/O latency histograms for both read and write operations.
2. Identifying the root cause
Check for resource contention: Given your high memory utilization (80%) and the processes you've identified (svchost.exe and MsMpEng.exe), consider:
- Windows Defender (MsMpEng.exe) can cause significant I/O activity during scans
- Multiple svchost.exe instances may indicate background Windows updates or services
- The AppXSVC and Windows Modules Installer events suggest Windows Update activity
Verify if you're hitting performance limits: Your gp3 volume is provisioned with 3000 IOPS. Check if your application is attempting to drive more IOPS or throughput than provisioned. If you experience micro-bursting (performance bursts shorter than the one-minute CloudWatch interval), the average metrics may show lower performance than actual peak demand.
Consider I/O characteristics: For transaction-intensive applications requiring low latency and high IOPS, ensure your queue length is optimal. If your I/O latency is higher than required and you're consistently exceeding provisioned performance, you may need to provision higher IOPS or consider a larger volume.
3. Safe troubleshooting approach
Non-disruptive monitoring:
- Enable detailed CloudWatch monitoring if not already active
- Review CloudWatch metrics over a longer time period to identify patterns
- Check the I/O performance status, which is available for gp3 volumes and compares actual performance to expected performance (statuses include Normal, Degraded, Severely Degraded, or Stalled)
Gradual remediation steps:
- Schedule Windows Defender scans during off-peak hours
- Stagger Windows Updates to minimize impact on jump host availability
- If consistently hitting IOPS limits, consider increasing provisioned IOPS on your gp3 volume (you can modify this without downtime)
- For persistent latency issues despite adequate provisioning, consider using AWS Fault Injection Service to run controlled experiments and test your architecture's resilience to storage performance degradation
Capacity planning:
If monitoring confirms you're consistently exceeding your volume's provisioned performance, consider using larger volumes or volumes with higher IOPS to meet your application requirements without impacting ongoing operations.
Sources
Amazon EBS I/O characteristics and monitoring - Amazon EBS
Amazon EBS volume events - Amazon EBS
Test and build application resilience using Amazon EBS latency injection | AWS Storage Blog
Amazon CloudWatch metrics for Amazon EBS - Amazon EBS
Amazon EBS volume status checks - Amazon EBS
Relevant content
- asked 2 years ago
- asked 4 years ago

let me try first and get back again. Thanks