How do I troubleshoot high CPUUtilization from the unified CloudWatch agent on my server?
I use the unified Amazon CloudWatch agent to push metrics and logs to CloudWatch. However, the CloudWatch agent has high CPUUtilization.
The CloudWatch agent might experience high CPU or memory usage for various reasons. Depending on your requirements and use case, complete any of these troubleshooting steps to optimize the CloudWatch agent's CPUUtilization.
Update the CloudWatch agent
Make sure that you're using the latest version of the CloudWatch agent. Newer versions usually include bug fixes and performance improvements.
To upgrade your CloudWatch agent version, query the CloudWatch agent status command. Based on the status output, update the agent to the latest version. For more information on the latest CloudWatch agent releases, see amazon-cloudwatch-agent on GitHub.
It's also a best practice to verify the signature of the CloudWatch agent package.
Reduce the use of asterisks
The use of asterisks (wildcard symbols) might cause the CloudWatch agent process to open and monitor a large number of files:
- When an absolute path is specified in the configuration, the CloudWatch agent goes to the specific directory to read the specified log file.
- When the path is specified with a wildcard such as * (asterisk) or ** (super asterisk), the agent regularly scans directories for new, matching files. For example, specifying /var/log/**.log prompts the unified CloudWatch agent to collect all .log files in the /var/log directory tree. In this case, the unified CloudWatch agent might have high CPUutilization because of the number of log files that it's checking for updates.
To make sure that the agent doesn't search through a large volume of files, specify absolute file paths. For example, modify the file_path from /var/log/**.log to /var/log/file.log.
Note: The unified CloudWatch agent uses a state file to detect changes through a byte offset. Therefore, the agent opens log files only if the byte offset changes.
Adjust the Metrics collection Interval
Review the agent configuration file to verify that you set it up correctly. You might need to adjust the metrics collection interval or reduce the number of metrics that the agent collects. For more information, see the metrics_collection_interval field in CloudWatch agent configuration file: Agent section.
Reduce the filters or patterns for the procstat plugin configuration
The procstat plugin allows you to collect metrics from individual processes. If the plugin monitors multiple processes, filters, and patterns, then it might increase your CPU usage.
Limit the number of processes that the procstat plugin monitors, based on your use case. Adjusting the metrics_collection_interval might also reduce CPU usage. If you omit this parameter, then it uses the default value of 60 seconds. For more information, see Collect process metrics with the procstat plugin.
Remove older log files
Implement Log rotation
The unified CloudWatch agent might consume high CPU usage for monitoring large sized log files on the path that you provide for the file_path field. To troubleshoot this issue, implement log rotation to make sure that the log files don't get too large. Also, remove older log files that you no longer need from the folder.
Configure auto_removal to remove older log files
Set the CloudWatch agent's built in field auto_removal to true. This allows the unified CloudWatch agent to automatically remove old log files after they upload the logs to Amazon CloudWatch Logs. If you omit this field, then it defaults to false.
The agent only removes complete files from logs that create multiple files, such as logs that create separate files for each date. If a log continuously writes to a single file, then it isn't removed. If you need to retain older logs, then turn on auto_removal with a backup of older logs.
For more information, see CloudWatch agent configuration file: Logs section.
Note: If you already implemented a log file rotation or removal method, then it's a best practice to omit the auto_removal field or set it to false. If you omit the auto_removal field, then the field defaults to false.
Limit CPU usage
On a Linux operating system
Restrict the CloudWatch agent's CPU usage to stay within certain percentage:
Open the /etc/systemd/system/amazon-cloudwatch-agent.service file.
Add CPUQuota as a new field to the [Service] section:
[Service] Type=simple ExecStart=/opt/aws/amazon-cloudwatch-agent/bin/start-amazon-cloudwatch-agent KillMode=process Restart=on-failure RestartSec=60s CPUQuota=30%
Restart the CloudWatch agent.
This prompts systemd to force the agent to use only the specified percentage of CPU or less. You can adjust the CPUQuota percentage to any number that best suits your use case. For more information, see systemd on GitHub.
On a Windows operating system
Use the Set priority and Set affinity features in Task Manager. This lets you control how CloudWatch agent utilizes system resources:
- Set priority : This changes the priority level of a running process. The priority level determines how much CPU time the unified CloudWatch agent process receives, relative to other running processes on the system.
- Set affinity : This controls the CPU cores that a process can run on. For CPUs with multiple cores, each core handles individual threads for running processes. By default, processes can use all available CPU cores. To restrict the number of cores that a process can use, change the affinity. For example, if your CPU has eight cores, then you can limit unified CloudWatch agent to run on four of those cores.
To modify the Set priority and Set affinity options in Windows's Task Manager, complete the following steps:
- Open Task Manager, and then choose the Details tab.
- Find the CloudWatch agent process, named amazon-cloudwatch-agent.exe. Then, open the context (right-click) menu.
- Choose Set priority or Set affinity to modify them.
Note: Limiting CPUUtilization for a process might affect its performance. It's a best practice to test and monitor your system after you modify these settings.
Upgrade to a higher instance type
If you can't make changes to the agent configuration file, then you can upgrade your instances to a higher instance type. This provides more CPU and memory resources for the applications and the unified CloudWatch agent. As a result, the agent is better able to process incoming logs before it publishes them to CloudWatch.