How do I troubleshoot high CPU utilization on an EC2 Linux instance?

5 minute read
0

I want to troubleshoot high CPU utilization in my Amazon Elastic Compute Cloud (Amazon EC2) Linux instance.

Resolution

High CPU usage can occur because of application-level activity, underprovisioned instances, or monitor mismatches. To troubleshoot high CPU usage, check your environment's steal time metrics. CPU steal time is the time that an instance is ready to use, but the instance can't proceed because the underlying physical resources are allocated elsewhere. High steal time impacts application performance, and causes slowdowns, timeouts, and inconsistent behavior.

To troubleshoot high CPU usage on your Linux instance, take the following troubleshooting actions.

Note: You might observe differences between Amazon CloudWatch metrics and instance-level tool metrics. This occurs because CloudWatch collects metrics at the hypervisor level, but instance tools measure from within the guest operating system (OS). CloudWatch also samples on 1-5 minute intervals where instance tools can provide second-by-second data. Time synchronization and CPU usage calculation methods might also cause discrepancies.

Measure your configuration's CPU steal time

To view your configuration's steal time, run the following command:

top

Example output:

top - 14:23:45 up 7 days, 2:03, 1 user, load average: 0.45, 0.50, 0.45 
Tasks: 105 total, 1 running, 104 sleeping, 0 stopped, 0 zombie
%Cpu(s): 5.3 us, 2.1 sy, 0.0 ni, 85.6 id, 1.2 wa, 0.0 hi, 0.3 si, 5.5 st
MiB Mem : 3949.2 total, 146.7 free, 1367.8 used, 2434.7 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 2312.8 avail Mem

In the output, check the st value to get the percentage of CPU steal time. In the preceding example, steal time accounts for 5.5% of all CPU time. It's a best practice to keep steal time below 5%. If your steal time is consistently above 10%, then check your instance for misconfiguration issues.

Use CloudWatch to monitor your CPU

Use CloudWatch to monitor your instance performance, and then check the CPUUtilization metric. You can also check CPUCreditUsage and CPUCreditBalance for t2 and t3 instances.

If there's a discrepancy between CloudWatch metrics and what you see in the instance, then verify that you correctly configured the CloudWatch agent. Check /opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log for errors.

Note: CloudWatch doesn't provide steal time metrics by default. Instead, you must configure the CloudWatch agent to add custom metrics.

Use instance-level tools to monitor the instance

For a more user-friendly interface to view your configuration's steal time, use htop instead of the top command. To download htop, see htop-dev/htop Releases on the GitHub website.

To view detailed system-level resource usage such as a memory, paging, I/O, and CPU over time, run the following command:

vmstat

Example output:

$ vmstat 1 
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b  swpd      free buff cache  si so bi bo in cs us sy id wa st 
0 0    0    150272 97545 2394124   0  0  0  5  1  2 5 2 86  1 6
0 0    0    150272 97545 2394124   0  0  0  0 435 625 4 2  88 0 6

Note: To view steal time, check the st column.

To view historical data about your resources, run the following command:

sar 

Example output:

 $ sar -P ALL 1 3
Linux 5.4.0-1045-aws (ip-10-0-1-100) 04/22/2025 _x86_64_ (2 CPU)
4:25:00 CPU %user %nice %system %iowait %steal %idle 
14:25:01 all 4.50 0.00 2.00 1.00 5.50 87.00 
14:25:01 0 4.00 0.00 2.00 1.00 6.00 87.00 
14:25:01 1 5.00 0.00 2.00 1.00 5.00 87.00

Note: You can configure sar to collect CPU metrics at regular intervals. For more information, see The sar command on the Red Hat website.

Check your instance type

Different instance types are more susceptible to steal time issues. For example, sustained high CPU usage can deplete CPU credits and throttle performance on burstable instance types such as t2 and t3.

If the credit balance of your burstable instance type is consistently low or at 0, then take one of the following actions:

Optimize your instance or application configuration

If a specific process uses high CPU, then take the following actions:

  • Investigate whether the high CPU usage is expected behavior.
  • Check application logs for errors or unexpected behavior.
  • Restart the application or service.

If the CPU usage is consistent with expected behavior but still too high, then tune your application to improve efficiency. For example, it's a best practice to move compute-heavy workloads to a different instance or container.

Manage your traffic and load patterns

If high CPU usage is a recurring issue because of traffic or load patterns, then take the following actions:

Related information

How do I troubleshoot an EC2 Linux instance that fails a status check due to over-utilization of resources?

AWS OFFICIALUpdated 11 days ago