How do I fix a "Kernel panic - not syncing" error on EC2 instances?

5 分的閱讀內容
1

I want to learn how to troubleshoot and resolve the "kernel panic" error that occurs after upgrading the kernel or rebooting my Amazon Elastic Cloud Compute (Amazon EC2) Linux instance due to missing initramfs or kernel modules.

Short description

The "Kernel panic - not syncing" error indicates that there is no such device or address. To resolve this error, set up a recovery instance where the faulty root disk is attached as a secondary drive for diagnostics to be performed.

When you set up a recovery instance, keep the following in mind:

Note: The following article applies to Amazon Linux 2, Amazon Linux 2023, Fedora 16 and later, and RHEL 7 and later.

Resolution

To attach the root disk to a rescue instance, complete the following steps:

  1. Create a new key pair or use an existing key pair.

  2. Get the volume ID and device name for the original instance's root volume.

  3. Stop the original instance.

  4. Launch a recovery instance from an AMI (Amazon Machine Image) with the same Linux operating system (OS) version in the same Availability Zone.

  5. Detach the root volume from the original instance and attach it to the recovery instance as a secondary volume. Note the volume device name.

  6. Connect to the recovery instance with your SSH key pair.

  7. To change to the root user, run the following command:

    [ec2-user ~]$ sudo su
  8. To identify the block device name and partition, run the following command from the recovery instance:

    [root ~]$ lsblk
    NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
    xvda    202:0    0    8G  0 disk
    └─xvda1 202:1    0    8G  0 part /
    xvdf    202:80   0  101G  0 disk
    └─xvdf1 202:81   0  101G  0 part

    The preceding example uses a XEN instance with blkfront drivers. Both volume device name /dev/xvda and /dev/xvdf are partitioned volumes, and /dev/xvdg is not. If your volume is partitioned, then run the following command to mount the partition (/dev/xvdf1) instead of the raw device (/dev/xvdf):

    [root ~]$ mount -o nouuid  /dev/xvdf1 /mnt

    If you use a Nitro-based instance, then the volume device name looks similar to /dev/nvme[0-26]n1. If your instance is built on Nitro with NVMe, then mount the partition at the /mnt directory. Use the device name that you identified earlier with the lsblk command:

    [root ~]$ mount -o nouuid  /dev/nvme1n1p1 /mnt

    For more information, see Device names on Linux instances.

  9. To create a chroot environment in the /mnt directory, run the following command:

    [root ~]$ for i in dev proc sys run; do mount -o bind /$i /mnt/$i; done; chroot /mnt

    In the preceding example, the /dev, /proc, /sys, and /run directories are bind-mounted from the original root file system. This allows processes that run inside the chroot environment to access these system directories.

  10. To create a backup of the initramfs in the "/" directory, run the following command:

    [root ~]$ for file in /boot/initramfs-*.img; do cp "${file}" "/$(basename "$file")_$(date +%Y%m%d)"; done
  11. To list the default kernel, run the following command:

    [root ~]$ grubby —default-kernel
    /boot/vmlinuz-5.15.156-102.160.amzn2.x86_64

    The preceding output shows the kernel that tries to boot at startup.

  12. List the kernels and initramfs in the boot directory as shown in the following command:

    [root ~]$ ls -lh /boot/vmlinuz* && ls -lh /boot/initr*
    -rwxr-xr-x. 1 root root 9.7M Apr 23 20:37 /boot/vmlinuz-5.10.215-203.850.amzn2.x86_64
    -rwxr-xr-x. 1 root root 9.9M Apr 23 17:00 /boot/vmlinuz-5.15.156-102.160.amzn2.x86_64
    -rw-------. 1 root root 12M May 3 23:45 /boot/initramfs-5.10.215-203.850.amzn2.x86_64.img
    -rw-------. 1 root root 9.8M May 14 08:03 /boot/initramfs-5.15.156-102.160.amzn2.x86_64.img

    Note where the vmlinuz kernel files have corresponding initramfs files.

  13. To rebuild the initramfs, run the following command. Update the kernel version field with the latest kernel version that you found in step 11:

    [root ~]$ dracut --force --verbose initramfs-<kernelVersion>.img <kernelVersion>
  14. To determine if the instance is booting on UEFI or BIOS, run the following command:

    [root ~]$ boot_mode=$(ls /sys/firmware/efi/efivars >/dev/null 2>&1 && echo "EFI" || echo "BIOS"); echo "Boot mode detected: $boot_mode"
  15. To update the grub configuration, choose one of the following commands based on the output from step 14.
    For BIOS, run the following command:

    [root ~]$ grub2-mkconfig -o /boot/grub2/grub.cfg

    For UEFI, run one of the following commands.
    Amazon Linux 2 and Amazon Linux 2023:

    [root ~]$ grub2-mkconfig -o /boot/efi/EFI/amzn/grub.cfg

    Fedora 16+:

    [root ~]$ grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg

    Red Hat 7+:

    [root ~]$ grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
  16. To exit and detach the volume, run the following command:

    [root ~]$ exit; umount -fl /mnt
  17. Detach the secondary volume from the recovery instance. Attach it to the original instance as the root device with the same device name from step 5. When the volume is attached, boot the instance.

AWS 官方
AWS 官方已更新 3 個月前
2 評論

The first option almost worked for me. After running chroot /mnt, I had to complete a broken yum update (yum-complete-transaction), run yum update/upgrade, reinstall the kernel (yum reinstall kernel) and run grub2-mkconfig -o /boot/grub2/grub.cfg.

edvin
回答 1 年前

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
管理員
回答 1 年前