Why do I see a "Kernel panic" error after I upgrade the kernel or reboot my EC2 Linux instance?

7 分的閱讀內容
1

I upgraded the kernel or system or performed a system reboot on my Amazon Elastic Compute Cloud (Amazon EC2) instance. Now, the instance fails to boot, and I see the error: "VFS: Cannot open root device XXX or unknown-block(0,0)Please append a correct "root=" boot option; here are the available partitions:Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)"

Short description

The following issues lead to boot failure and kernel panic error messages:

  • The initramfs or initrd image is missing from the newly updated kernel configuration in /boot/grub/grub.conf. Or, the initrd or initramfs file is missing from the /boot directory.
  • The kernel or system packages weren't fully installed during the upgrade process due to insufficient space.
  • Third-party modules are missing from the initrd or initramfs image. For example, NVMe, LVM, or RAID modules.

Resolution

The initramfs or initrd image is missing from the /boot/grub/grub.conf or /boot directory

Use one of the following methods to correct this:

Use the EC2 Serial Console

If you turned on the EC2 serial console for Linux instances, then you can use it to troubleshoot supported Nitro-based instance types and bare metal instances. The serial console helps you troubleshoot boot issues and network and SSH configuration issues. The serial console connects to your instance without needing a working network connection. To access the serial console, use the Amazon EC2 console or the AWS Command Line Interface (AWS CLI).

If you're using the EC2 serial console for the first time, then make sure that you review the prerequisites, and configure access before trying to connect.

If your instance is unreachable and you haven't configured access to the serial console, then follow the instructions in Use a rescue instance. For information on configuring the EC2 serial console for Linux instances, see Configure access to the EC2 serial console.

Note: If you receive errors when running AWS CLI commands, make sure that you're using the most recent version of the AWS CLI.

Use a rescue instance

Warning: The following procedure requires stopping the instance. Data that's stored in instance store volumes is lost when the instance is stopped. Make sure that you save a backup of the data before stopping the instance. Unlike Amazon Elastic Block Store (Amazon EBS)-backed volumes, instance store volumes are ephemeral and don't support data persistence.

The static public IPv4 address that Amazon EC2 automatically assigned to the instance on launch or start changes after the stop and start. To retain a public IPv4 address that doesn't change when the instance is stopped, use an Elastic IP address.

For more information, see What happens when you stop an instance.

1.    Open the Amazon EC2 console.

2.    Choose Instances from the navigation pane, and then select the impaired instance.

3.    Choose Actions, Instance State, Stop instance.

4.    In the Storage tab, select the Root device, and then select the Volume ID.

Note: It's a best practice to create a snapshot of the root volume as a backup before proceeding.

5.    Choose Actions, Detach Volume (/dev/sda1 or /dev/xvda), and then choose Yes, Detach.

6.    Verify that the State is Available.

7.    Launch a new EC2 instance in the same Availability Zone and with the same operating system as the impaired instance. The new instance becomes your rescue instance.

Or, you can use an existing instance that uses the same AMI and is in the same Availability Zone as your impaired instance.

8.    After the rescue instance launches, choose Volumes from the navigation pane, and then select the detached root volume of the original instance.

9.    Choose Actions, Attach Volume.

10.    Select the rescue instance ID (1-xxxx) and then enter /dev/xvdf.

11.    Run the following command to verify that the root volume of the impaired instance successfully attached to the rescue instance:

$ lsblk

The following is example output from a Nitro instance:

NAME    MAJ:MIN   RM  SIZE RO TYPE MOUNTPOINT
nvme0n1   202:0     0   15G  0  disk
└─nvme0n1p1 202:1     0   15G  0  part /
nvme1n1    202:80    0   15G  0  disk
└─nvme1n1p1 202:0     0   15G  0  part

The following is example output from a Xen instance:

NAME    MAJ:MIN   RM  SIZE RO TYPE MOUNTPOINT
xvda    202:0     0   15G  0  disk
└─xvda1 202:1     0   15G  0  part /
xvdf    202:80    0   15G  0  disk
└─xvdf1 202:0     0   15G  0  part

12.    Create a mount directory and then mount under /mnt.

$ mount -o nouuid /dev/nvme1n1p1 /mnt

13.    Run the following command to invoke a chroot environment:

$ for i in dev proc sys run; do mount -o bind /$i /mnt/$i; done

14.    Run the chroot command on the mounted /mnt file system:

$ chroot /mnt

Note: The working directory is changed to "/".

15.    Run the following commands based on your operating system.

RPM-based operating systems

$ sudo grub2-mkconfig -o /boot/grub2/grub.cfg
$ sudo dracut -f -v

Debian-based operating systems

$ sudo update-grub && sudo update-grub2
$ sudo update-initramfs -u -v

16.    Verify that the initrd or initramfs image is present in the /boot directory and that the image has a corresponding kernel image. For example, vmlinuz-4.14.138-114.102.amzn2.x86_64 and initramfs-4.14.138-114.102.amzn2.x86_64.img.

17.    Run the following commands to exit and clean up the chroot environment:

$ exit
umount /mnt/{dev,proc,run,sys,}

18.    Detach the root volume from the rescue instance and attach the volume to the original instance.

19.    Start the original instance.

The kernel or system package wasn't fully installed during an update

Revert to a previous kernel version. For instructions, see How do I revert to a known stable kernel after an update prevents my Amazon EC2 instance from rebooting successfully?

Third-party modules are missing from the initrd or initramfs image

Investigate to determine what module or modules are missing from the initrd or initramfs image. Then, verify if you can add the module back to the image. In many cases, it's easier to rebuild the instance.

The following is example console output from an Amazon Linux 2 instance running on the Nitro platform. The instance is missing the nvme.ko module from the initramfs image:

dracut-initqueue[1180]: Warning: dracut-initqueue timeout - starting timeout scripts
dracut-initqueue[1180]: Warning: Could not boot.
[  OK  ] Started Show Plymouth Boot Screen.
[  OK  ] Reached target Paths.
[  OK  ] Reached target Basic System.
dracut-initqueue[1180]: Warning: /dev/disk/by-uuid/55da5202-8008-43e8-8ade-2572319d9185 does not exist
dracut-initqueue[1180]: Warning: Boot has failed. To debug this issue add "rd.shell rd.debug" to the kernel command line.
Starting Show Plymouth Power Off Screen...

To determine if the kernel panic error is caused by a missing third-party module or modules, do the following:

1.    Use the EC2 serial console to create a chroot environment in the root volume of the impaired instance.

Or

Follow steps 1-14 in the Use a rescue instance section to create a chroot environment in the root volume of the non-booting instance.

2.    Use one of the following three options to determine which modules are missing from the initramfs or initrd image:

Option 1: To determine if the initrd or initramifs image rebuild fails, run the dracut -f -v command in the /boot directory. Also, use the dracut -f -v command to list the missing modules.

Note: The dracut -f -v command might add the missing modules to the initrd or intramifs image. If the command doesn't find errors, try to reboot the instance. If the instance reboots successfully, then the command resolved the error.

Option 2: Run the lsinitrd initramfs-4.14.138-114.102.amzn2.x86_64.img | less command to view the contents of the initrd or initramfs file. Replace initramfs-4.14.138-114.102.amzn2.x86_64.img with the name of your image.

Option 3: Inspect the /usr/lib/modules directory.

3.    If you find a missing module, then try to add it back to the kernel. For information on how to obtain and add modules into the kernel, see the documentation specific to your Linux distribution.

AWS 官方
AWS 官方已更新 8 個月前
2 評論

The first option almost worked for me. After running chroot /mnt, I had to complete a broken yum update (yum-complete-transaction), run yum update/upgrade, reinstall the kernel (yum reinstall kernel) and run grub2-mkconfig -o /boot/grub2/grub.cfg.

edvin
回答 6 個月前

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
管理員
回答 6 個月前