Why is my EC2 Linux instance not booting and going into emergency mode?

6 minutos de lectura
0

When I boot my Amazon Elastic Compute Cloud (Amazon EC2) Linux instance, the instance goes into emergency mode and the boot process fails. Then, the instance is inaccessible. How can I fix this?

Short description

The most common reasons an instance might boot in emergency mode are:

  • A corrupted kernel.
  • Auto-mount failures because of incorrect entries in the /etc/fstab.

To verify what type of error is occurring, view the instance's console output. You might see a Kernel panic error message in the console output if the kernel is corrupted. Dependency failed messages appear in the console output if auto-mount failures occur.

Resolution

Kernel panic errors

Kernel panic error messages occur when the grub configuration or initramfs file is corrupted. If a problem with the kernel exists, you might see the error "Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,1)" in the console output.

To resolve kernel panic errors:

1.    Revert the kernel to a previous, stable kernel. For instructions on how to revert to a previous kernel, see How do I revert to a known stable kernel after an update prevents my Amazon EC2 instance from rebooting successfully?

2.    After you revert to a previous kernel, reboot the instance. Then, correct the issues on the corrupted kernel.

Dependency failed errors

Auto-mount failures caused by syntax errors in the /etc/fstab file can cause the instance to enter emergency mode. Also, if the Amazon Elastic Block Store (Amazon EBS) volume listed in the file is detached from the instance, then the instance boot process might enter emergency mode. If either of these problems occur, then the console output looks similar to the following:

-------------------------------------------------------------------------------------------------------------------
[[1;33mDEPEND[0m] Dependency failed for /mnt.
[[1;33mDEPEND[0m] Dependency failed for Local File Systems.
[[1;33mDEPEND[0m]
    Dependency failed for Migrate local... structure to the new structure.
[[1;33mDEPEND[0m] Dependency failed for Relabel all filesystems, if necessary.
[[1;33mDEPEND[0m] Dependency failed for Mark the need to relabel after reboot.
[[1;33mDEPEND[0m]
    Dependency failed for File System Check on /dev/xvdf.
-------------------------------------------------------------------------------------------------------------------

The preceding example log messages show that the /mnt mount point failed to mount during the boot sequence.

To prevent the boot sequence from entering emergency mode due to mount failures:

  • Add a nofail option in the /etc/fstab file for the secondary partitions ( /mnt, in the preceding example). When the nofail option is present, the boot sequence isn't interrupted, even if mounting of any volume or partition fails.
  • Add 0 as the last column of the /etc/fstab file for the respective mount point. Adding the 0 column disables the file system check, allowing the instance to successfully boot.

There are three methods you can use to correct the /etc/fstab file.

Important:

Methods 2 and 3 require a stop and start of the instance. Be aware of the following:

  • If your instance is instance store-backed or has instance store volumes containing data, then the data is lost when the instance is stopped. For more information, see Determine the root device type of your instance.
  • If your instance is part of an Amazon EC2 Auto Scaling group, then stopping the instance might terminate it. Instances launched with Amazon EMR, AWS CloudFormation, AWS Elastic Beanstalk might be part of an AWS Auto Scaling group. Instance termination in this scenario depends on the instance scale-in protection settings for your Auto Scaling group. If your instance is part of an Auto Scaling group, temporarily remove it from the Auto Scaling group before starting the resolution steps.
  • Stopping and starting the instance changes the public IP address of your instance. It's a best practice to use an Elastic IP address instead of a public IP address when routing external traffic to your instance.

Method 1: Use the EC2 Serial Console

If you’ve enabled EC2 Serial Console for Linux, you can use it to troubleshoot supported Nitro-based instance types. The serial console helps you troubleshoot boot issues, network configuration, and SSH configuration issues. The serial console connects to your instance without the need for a working network connection. You can access the serial console using the Amazon EC2 console or the AWS Command Line Interface (AWS CLI).

Before using the serial console, grant access to it at the account level. Then, create AWS Identity and Access Management (IAM) policies granting access to your IAM users. Also, every instance using the serial console must include at least one password-based user. If your instance is unreachable and you haven’t configured access to the serial console, then follow the instructions in Method 2. For information on configuring the EC2 Serial Console for Linux, see Configure access to the EC2 Serial Console.

Note: If you receive errors when running AWS CLI commands, make sure that you’re using the most recent version of the AWS CLI.

Method 2: Run the AWSSupport-ExecuteEC2Rescue automation document

If your instance is configured for AWS Systems Manager, you can run the AWSSupport-ExecuteEC2Rescue automation document to correct boot issues. Manual intervention isn't needed when using this method. For information on using the automation document, see Walkthrough: Run the EC2Rescue tool on unreachable instances.

Method 3: Manually edit the file using a rescue instance

1.    Open the Amazon EC2 console.

2.    Choose Instances from the navigation pane, and then select the instance that's in emergency mode.

3.    Stop the instance.

4.    Detach the Amazon EBS root volume ( /dev/xvda or /dev/sda1) from the stopped instance.

5.    Launch a new EC2 instance in same Availability Zone as the impaired instance. The new instance becomes your rescue instance.

6.    Attach the root volume you detached in step 4 to the rescue instance as a secondary device.

Note: You can use different device names when attaching secondary volumes.

7.    Connect to your rescue instance using SSH.

8.    Create a mount point directory for the new volume attached to the rescue instance in step 6. In the following example, the mount point directory is /mnt/rescue.

$ sudo mkdir /mnt/rescue

9.    Mount the volume at the directory you created in step 8.

$ sudo mount /dev/xvdf /mnt/rescue

Note: The device (/dev/xvdf, in the preceding example) might be attached to the rescue instance with a different device name. Use the lsblk command to view your available disk devices along with their mount points to determine the correct device names.

10.    After the volume is mounted, run the following command to open the /etc/fstab file.

$ sudo vi /mnt/rescue/etc/fstab

11.    Edit the entries in /etc/fstab as needed. The following example output shows three EBS volumes defined with UUIDs, the nofail option added for both secondary volumes, and a 0 as the last column for each entry.

------------------------------------------------------------------------------------------
$ cat /etc/fstab
UUID=e75a1891-3463-448b-8f59-5e3353af90ba  /  xfs  defaults,noatime  1  0
UUID=87b29e4c-a03c-49f3-9503-54f5d6364b58  /mnt/rescue  ext4  defaults,noatime,nofail  1  0
UUID=ce917c0c-9e37-4ae9-bb21-f6e5022d5381  /mnt  ext4  defaults,noatime,nofail  1  0  
------------------------------------------------------------------------------------------

12.    Save the file, and then run the umount command to unmount the volume.

$ sudo umount /mnt/rescue

13.    Detach the volume from the temporary instance.

14.    Attach the volume to original instance, and then start the instance to confirm that it boots successfully.


OFICIAL DE AWS
OFICIAL DE AWSActualizada hace 2 años