I want to troubleshoot performance bottlenecks within my EC2 Linux instances. What advanced tools can I use with EC2Rescue for Linux to do that?
I want to troubleshoot performance bottlenecks within my Amazon Elastic Compute Cloud (Amazon EC2) instances running Amazon Linux 1 or 2, RHEL or CentOS. What tools can I use with EC2Rescue for Linux to do that?
Short description
Performance bottlenecks on Linux instances running Amazon Linux 1 or 2, RHEL, or CentOS can occur in CPU performance, block I/O performance, or network performance. To determine where performance bottlenecks are happening, you can leverage the 33 tools available in EC2Rescue for Linux using the bcc framework in eBPF (extended Berkeley Packet Filter). eBPF efficiently and securely runs monitoring tools in production environments without significant performance overhead.
Note: This resolution doesn't apply to instances running Debian or Ubuntu.
Resolution
(For experienced Linux system administrators)
Install the bcc package for your operating system.
1. Connect to your instance using SSH.
2. Install the bcc package. For download and installation instructions for distributions other than Amazon Linux, refer to the documentation specific to your distribution. For Amazon Linux instances, use the following command:
$ sudo yum install bcc
3. The bcc tools must be in the PATH variable on your operating system for EC2 Rescue for Linux to run them. Use the following command to put the tools in the PATH variable:
$ sudo -s # export PATH=$PATH:/usr/share/bcc/tools/
4. It's a best practice to permanently add the PATH setting to your Linux system. The steps to make this setting permanent vary depending on your specific Linux distribution. For Amazon Linux, use the following commands:
Open ~/.bash_profile using the vi editor:
# vi ~/.bash_profile
Append /usr/share/bcc/tools to the PATH variable:
PATH=$PATH:$HOME/bin:/usr/share/bcc/tools
Save the file and exit the vi editor.
Source the updated profile:
#source ~/.bash_profile
6. Download and install the EC2 Rescue for Linux tool, and then navigate to the installation directory on your instance.
The following are commonly used bcc-based modules used with EC2Rescue for Linux.
CPU performance tools
bccsoftirqs.yaml - This module runs the softirqs tool that traces soft interrupts (IRQs), and then stores timing statistics in-kernel for efficiency. An interval can be provided using --period, and a count using --times argument. The tool automatically prints the timestamps for each time it runs. For more information, see EC2Rescue for Linux - bccsoftirqs.yaml on the GitHub website.
bccrunqlat.yaml - This program shows how long tasks have spent waiting their turn to run on-CPU. Results are shown as a histogram. For more information, see EC2Rescue for Linux - bccrunqlat.yaml on the GitHub website.
# ./ec2rl run --only-modules=bccsoftirqs,bccrunqlat --period=5 --times=5
Block I/O performance tools
bccbiolatency.yaml - Traces block device I/O, and records the distribution of I/O latency (time) per disk device, such as an instance store and Amazon Elastic Block Store (Amazon EBS), attached to your EC2 instance. Results are printed as a histogram. The module runs for the specified period and collects output a specified number of times. In the example below, the period and times variables are set to 5. For more information, see EC2Rescue for Linux - bccbiolatency.yaml on the GitHub website.
bccext4slower.yaml - Collects output using the ext4slower tool. ext4slower traces any ext4 reads, writes, opens, and fsyncs that are slower than a threshold of 10 ms by default. The module runs for the specified period and collects output a specified number of times. In the example below, the period and times variables are set to 5. For more information, see EC2Rescue for Linux - bccext4slower.yaml on the GitHub website.
You can use the bccxfsslower module similarly to bccext4slower.yaml for XFS file systems. For more information, see EC2Rescue for Linux - bccxfsslower.yaml on the GitHub website.
bccfileslower.yaml - Collects output using fileslower that traces file-based synchronous reads and writes slower than a default threshold of 10 ms. The module runs for the specified period and then collects output a specified number of times. In the example below, the period and times variables are set to 5. For more information, see EC2Rescue for Linux - bccfileslower.yaml on the GitHub website.
# ./ec2rl run --only-modules=bccbiolatency,bccext4slower,bccfileslower --period=5 --times=5
Network performance tools
bcctcpconnlat.yaml - Traces the kernel function performing active TCP connections (for example, through a connect() syscall). The results display the latency (time) for the connection. Latency is measured locally, meaning the time from SYN sent to the response packet for a specified period. TCP connection latency indicates the time taken to establish a connection. For more information, see EC2Rescue for Linux - bcctcpconnlat.yaml on the GitHub website.
bcctcptop.yaml - Displays TCP connection throughput per host and port for the specified period and times without clearing the screen. For more information, see EC2Rescue for Linux - bcctcptop.yaml on the GitHub website.
bcctcplife.yaml - Summarizes TCP sessions that open and close while tracing. For more information, see EC2Rescue for Linux - bcctcplife.yaml on the GitHub website.
# ./ec2rl run --only-modules=bcctcpconnlat,bcctcptop,bcctcplife --period=5 --times=5
Output example
The results of running these modules are located under the /var/tmp/ec2rl directory after each single run of one or more modules on your instance.
The following example is the output from the bcctcptop module with the period parameter set to 5 and the times parameter set to 2:
# ./ec2rl run --only-modules=bcctcptop --period=5 --times=2 # cat /var/tmp/ec2rl/2020-04-20T21_50_01.177374/mod_out/run/bcctcptop.log I will collect tcptop output from this alami box 2 times. Tracing... Output every 5 secs. Hit Ctrl-C to end 21:50:17 loadavg: 0.74 0.33 0.17 5/244 4285 PID COMM LADDR RADDR RX_KB TX_KB 3989 sshd 172.31.22.238:22 72.21.196.67:26601 0 9 21:50:22 loadavg: 0.84 0.36 0.18 4/244 4285 PID COMM LADDR RADDR RX_KB TX_KB 3989 sshd 172.31.22.238:22 72.21.196.67:26601 0 11 2731 amazon-ssm-a 172.31.22.238:54348 52.94.225.236:443 5 4 2938 amazon-ssm-a 172.31.22.238:58878 52.119.197.249:443 0 0
You can upload results to AWS Support using the following command:
# ./ec2rl upload --upload-directory=/var/tmp/ec2rl/2020-04-20T21_50_01.177374 --support-url="URLProvidedByAWSSupport"
Note: The quotation marks in the preceding command are required. If you run the tool with sudo, upload the results using sudo. Run the command help upload for details on using an Amazon Simple Storage Service (Amazon S3) presigned URL to upload the output.
Related information
How do I diagnose high CPU utilization on an EC2 Windows instance?

Relevant content
- asked a year agolg...
- asked a year agolg...
- asked a year agolg...
- Accepted Answerasked 2 years agolg...
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 6 months ago