How do I use EC2Rescue for Linux to troubleshoot operating system-level issues?

6 minute read
0

I can't connect to my Amazon Elastic Compute Cloud (Amazon EC2) Linux instance or I'm experiencing boot issues. To correct these problems, I need to fix common issues such as OpenSSH file permissions or gather system (OS) logs for analysis and troubleshooting. How can I use EC2Rescue for Linux to do this?

Short description

EC2Rescue for Linux is a tool that helps diagnose and troubleshoot problems on Amazon EC2 Linux instances. EC2Rescue for Linux is run on your Amazon EC2 Linux instance to correct operating system-level issues. EC2Rescue for Linux also collects advanced logs, system utilization reports, and configuration files for further analysis.

Common scenarios addressed by EC2Rescue for Linux:

  • Collect system utilization reports such as vmstat, iostat, mpstat, and so on.
  • Collect logs and details such as syslog, dmesg, application error logs, and SSM logs.
  • Detect system problems such as asymmetric routing or duplicate root device labels.
  • Automatically remediate system problems such as correcting OpenSSH file permissions or disabling known problematic kernel parameters.

System Requirements

EC2Rescue for Linux requires an Amazon EC2 Linux instance that meets the following prerequisites:

Supported operating systems

  • Amazon Linux 2
  • Amazon Linux 2016.09+
  • SLES 12+
  • RHEL 7+
  • Ubuntu 16.04+

Software requirements

  • Python 2.7.9+ or 3.2+

Note: If you’ve enabled EC2 Serial Console for Linux, then you can use it to troubleshoot supported Nitro-based instance types. The serial console helps you troubleshoot boot issues, network configuration, and SSH configuration issues. The serial console connects to your instance without the need for a working network connection. You can access the serial console using the Amazon EC2 console or the AWS Command Line Interface (AWS CLI).

Before using the serial console, grant access to the console at the account level. Then create AWS Identity and Access Management (IAM) policies granting access to your IAM users. Also, every instance using the serial console must include at least one password-based user. If your instance is unreachable and you haven’t configured access to the serial console, follow the instructions in the Resolution section. For information on configuring the EC2 Serial Console for Linux, see Configure access to the EC2 Serial Console.

Note: If you receive errors when running AWS CLI commands, make sure that you’re using the most recent version of the AWS CLI.

Resolution

To troubleshoot an unreachable Amazon EC2 Linux instance using EC2Rescue for Linux, do the following:

1.    Launch a new Amazon EC2 instance in your virtual private cloud (VPC) using the same Amazon Machine Image (AMI) and in the same Availability Zone as the impaired instance. The new instance becomes your "rescue" instance. Or, you can use an existing instance that you can access, if it uses the same AMI and is in the same Availability Zone as your impaired instance.

2.    Detach the Amazon Elastic Block Store (Amazon EBS) root volume (/dev/xvda or /dev/sda1) from your impaired instance. Make a note of the device name to be sure it is the same when you re-attach it later

3.    Attach the EBS volume as a secondary device (/dev/sdf) to the rescue instance.

4.    Connect to your rescue instance using SSH.

5.    Become root, identify the correct device name using lsblk, then save this for use throughout the process:

$ sudo -i
# lsblk
# rescuedev=/dev/xvdf1

Note: The device (/dev/xvdf1) might be attached to the rescue instance with a different device name. Use the lsblk command to view your available disk devices along with their mount points to determine the correct device names.

6.    Select an appropriate temporary mountpoint to use, and ensure it exists, use /mnt unless this is already in use:.

# rescuemnt=/mnt
# mkdir -p $rescuemnt

7.    Mount the root file system from the attached volume:

# mount $rescuedev $rescuemnt

Note: If the volume mount fails, check dmesg | tail. If the logs suggest conflicting UUID, use the option -o nouuid.

8.    Mount special file systems and change the root directory (chroot) to the newly mounted file system:

# for i in proc sys dev run; do mount --bind /$i $rescuemnt/$i ; done
# chroot $rescuemnt

9.     Download and install the EC2Rescue Tool for Linux on an offline Linux root volume:

# curl -O https://s3.amazonaws.com/ec2rescuelinux/ec2rl.tgz
# tar -xf ec2rl.tgz

10.    Verify the installation by listing the help file:

# cd ec2rl-<version_number>
# ./ec2rl help

11.    Run EC2Rescue for Linux with no options to run all modules:

# ./ec2rl run

12.    View the results in /var/tmp/ec2rl:

# cat /var/tmp/ec2rl/*/Main.log | more

13.    Enable remediation for the supported modules based on the results:

# ./ec2rl run --remediate

14.    After remediation is complete, exit from chroot and unmount the secondary device:

# exit
# umount $rescuemnt/{proc,sys,dev,run,}

Note: If the unmount operation isn't successful, you might have to stop or reboot the rescue instance to enable a clean unmount.

15.    Detach the secondary volume (/dev/sdf) from the rescue EC2 instance, and then attach it to the original instance as /dev/xvda or /dev/sda1 (root volume). Ensure this is the same as seen in step 2.

16.    Start the EC2 instance and then verify that the instance is responsive.

Note: You can also use an AWS Systems Manager Automation document to troubleshoot connection issues. For more information, see Walkthrough: Run the EC2Rescue tool on unreachable instances. The AWSSupport-ExecuteEC2Rescue document is designed to automate steps normally required to use EC2Rescue for Linux. These steps are a combination of Systems Manager actions, AWS CloudFormation actions, and AWS Lambda functions.

Additional troubleshooting


Related information

Recover your impaired instances using EC2Rescue and Amazon EC2 Systems Manager Automation

AWS OFFICIAL
AWS OFFICIALUpdated 2 years ago