I'm receiving a "Kernel panic" error after I've upgraded the kernel or tried to reboot my EC2 Linux instance. How can I fix this?

7 minute read
1

I completed a kernel or system upgrade or after a system reboot on my Amazon Elastic Compute Cloud (Amazon EC2) instance. Now the instance fails to boot and the following message appears: "VFS: Cannot open root device XXX or unknown-block(0,0)Please append a correct "root=" boot option; here are the available partitions:Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)"

Short description

Your instance might fail to boot and show the kernel panic error message for the following reasons:

  • The initramfs or initrd image is missing from the newly updated kernel configuration in /boot/grub/grub.conf. Or, the initrd or initramfs file itself is missing from the /boot directory.
  • The kernel or system packages weren't fully installed during the upgrade process due to insufficient space.
  • Third-party modules are missing from the initrd or initramfs image. For example, NVMe, LVM, or RAID modules.

Resolution

The initramfs or initrd image is missing from the /boot/grub/grub.conf or /boot directory

Use one of the following methods to correct this:

Method 1: Use the EC2 Serial Console

If you turned on EC2 Serial Console for Windows, then you can use it to troubleshoot supported Nitro-based instance types. The serial console helps you troubleshoot boot issues, network configuration, and SSH configuration issues. The serial console connects to your instance without the need for a working network connection. You can access the serial console using the Amazon EC2 console or the AWS Command Line Interface (AWS CLI).

Before using the serial console, grant access to it at the account level. Then, create AWS Identity and Access Management (IAM) policies granting access to your IAM users. Also, every instance using the serial console must include at least one password-based user. If your instance is unreachable and you haven't configured access to the serial console, follow the instructions in Method 2: Use a rescue instance. For information on configuring the EC2 Serial Console for Linux, see Configure access to the EC2 Serial Console.

Note: If you receive errors when running AWS CLI commands, make sure that you're using the most recent version of the AWS CLI.

Method 2: Use a rescue instance

Warning:

  • This procedure requires a stop and start of your EC2 instance. Be aware that if your instance is instance store-backed or has instance store volumes containing data, the data is lost when you stop the instance. For more information, see Determine the root device type of your instance.
  • If you launch instances using EC2 Auto Scaling, stopping the instance might terminate the instance. Some AWS services use EC2 Auto Scaling to launch instances, such as Amazon EMR, AWS CloudFormation, and AWS Elastic Beanstalk. Check the instance scale-in protection settings for your Auto Scaling group. If your instance is part of an Auto Scaling group, temporarily remove the instance from the Auto Scaling group before starting the resolution steps.
  • When you stop and start an instance, the public IP address of your instance changes. It's a best practice to use an Elastic IP address instead of a public IP address when routing external traffic to your instance.

1.    Open the Amazon EC2 console.

2.    Choose Instances from the navigation pane, and then select the impaired instance.

3.    Choose Actions, Instance State, Stop instance.

4.    In the Storage tab, select the Root device, and then select the Volume ID.

Note: You can create a snapshot of the root volume as a backup before proceeding to step 5.

5.    Choose Actions, Detach Volume (/dev/sda1 or /dev/xvda), and then choose Yes, Detach.

6.    Verify that the State is Available.

7.    Launch a new EC2 instance in the same Availability Zone and with the same operating system and same kernel version as the original instance. You can install the appropriate kernel version after the initial launch and then perform a reboot. This new instance is your rescue instance.

8.    After the rescue instance launches, choose Volumes from the navigation pane, and then select the detached root volume of the original instance.

9.    Choose Actions, Attach Volume.

10.    Select the rescue instance ID (1-xxxx) and then enter /dev/xvdf.

11.     Run the following command to verify that the root volume of the impaired instance attached to the rescue instance successfully:

$ lsblk

The following is an example of the output:

NAME    MAJ:MIN   RM  SIZE RO TYPE MOUNTPOINT
xvda    202:0     0   15G  0  disk
└─xvda1 202:1     0   15G  0  part /
xvdf    202:80    0   15G  0  disk
└─xvdf1 202:0     0   15G  0  part

12.    Create a mount directory and then mount under /mnt.

$ mount -o nouuid /dev/xvdf1 /mnt
  1. Invoke a chroot environment by running the following command:
$ for i in dev proc sys run; do mount -o bind /$i /mnt/$i; done

14.    Run the chroot command on the mounted /mnt file system:

$ chroot /mnt

Note: The working directory is changed to "/".

15.    Run the following commands based on your operating system.

RPM-based operating systems:

$ sudo grub2-mkconfig -o /boot/grub2/grub.cfg
$ sudo dracut -f -vvvvv

Debian-based operating systems:

$ sudo update-grub && sudo update-grub2
$ sudo update-initramfs -u -vvvvv

16.    Verify that the initrd or initramfs image is present in the /boot directory and that the image has a corresponding kernel image. For example, vmlinuz-4.14.138-114.102.amzn2.x86_64 and initramfs-4.14.138-114.102.amzn2.x86_64.img.

17.    After verifying that the latest kernel has a corresponding initrd or initramfs image, run the following commands to exit and cleanup the chroot environment:

$ exit
umount /mnt/{dev,proc,run,sys,}

18.    Detach the root volume from the rescue instance and attach the volume to the original instance.

19.    Start the original instance.

The kernel or system package wasn't fully installed during an update

Revert to a previous kernel version. For instructions, see How do I revert to a known stable kernel after an update prevents my Amazon EC2 instance from rebooting successfully?

Third-party modules are missing from the initrd or initramfs image

Investigate to determine what module or modules are missing from the initrd or initramfs image. Then verify if you can add the module back to the image. In many cases, it's easier to rebuild the instance.

The following is example console output from an Amazon Linux 2 instance running on the Nitro platform. The instance is missing the nvme.ko module from the initramfs image:

dracut-initqueue[1180]: Warning: dracut-initqueue timeout - starting timeout scripts
dracut-initqueue[1180]: Warning: Could not boot.
[  OK  ] Started Show Plymouth Boot Screen.
[  OK  ] Reached target Paths.
[  OK  ] Reached target Basic System.
dracut-initqueue[1180]: Warning: /dev/disk/by-uuid/55da5202-8008-43e8-8ade-2572319d9185 does not exist
dracut-initqueue[1180]: Warning: Boot has failed. To debug this issue add "rd.shell rd.debug" to the kernel command line.
Starting Show Plymouth Power Off Screen...

To determine if the kernel panic error is caused by a missing third-party module or modules, do the following:

1.    Use Method 1: Use the EC2 Serial Console in the preceding section to create a chroot environment in root volume of the non-booting instance.

-or-

Follow steps 1-14 in Method 2: Use a rescue instance in the preceding section to create a chroot environment in the root volume of the non-booting instance.

2.    Use one of the following three options to determine which module or modules are missing from the initramfs or initrd image:

Option 1: Run the dracut -f -v command in the /boot directory to determine if rebuilding the initrd or initramfs image fails. Also use the dracut -f -v command to list which module or modules is missing.
Note: The dracut -f -v command might add any missing modules to the initrd or intramifs image. If the command doesn't find errors, try to reboot the instance. If the instance reboots successfully, then the command resolved the error.

Option 2: Run the lsinitrd initramfs-4.14.138-114.102.amzn2.x86_64.img | less command to view the contents of the initrd or initramfs file. Replace initramfs-4.14.138-114.102.amzn2.x86_64.img with the name of your image.

Option 3: Inspect the /usr/lib/modules directory.

3.     If you find a missing module, you can try to add it back to the kernel. For information on how to obtain and add modules into the kernel, see the documentation specific to your Linux distribution.


AWS OFFICIAL
AWS OFFICIALUpdated 5 months ago