How do I revert to a known stable kernel after an update prevents my Amazon EC2 instance from rebooting successfully?

10 分的閱讀內容
1

How do I revert to a stable kernel after an update prevents my Amazon Elastic Compute Cloud (Amazon EC2) instance from rebooting successfully?

Short description

If you performed a kernel update to your EC2 Linux instance but the kernel is now corrupt, then the instance can't reboot. You can't use SSH to connect to the impaired instance.

To revert to the previous versions, do the following:

1.    Access the instance's root volume.

2.    Update the default kernel in the GRUB bootloader.

Resolution

Access the instance's root volume

There are two methods to access the root volume:

Method 1: Use the EC2 Serial Console

If you enabled EC2 Serial Console for Linux, then you can use it to troubleshoot supported Nitro-based instance types. The serial console helps you troubleshoot boot issues, network configuration, and SSH configuration issues. The serial console connects to your instance without the need for a working network connection. You can access the serial console using the Amazon EC2 console or the AWS Command Line Interface (AWS CLI).

Before using the serial console, grant access to it at the account level. Then create AWS Identity and Access Management (IAM) policies granting access to your IAM users. Also, every instance using the serial console must include at least one password-based user. If your instance is unreachable and you haven’t configured access to the serial console, then follow the instructions in Method 2. For information on configuring the EC2 Serial Console for Linux, see Configure access to the EC2 Serial Console.

Note: If you receive errors when running AWS CLI commands, make sure that you’re using the most recent version of the AWS CLI.

Method 2: Use a rescue instance

Create a temporary rescue instance, and then remount your Amazon Elastic Block Store (Amazon EBS) volume on the rescue instance. From the rescue instance, you can configure your GRUB to take the previous kernel for booting.

Important: Don't perform this procedure on an instance store-backed instance. Because the recovery procedure requires a stop and start of your instance, any data on that instance is lost. For more information, see Determine the root device type of your instance.

1.    Create an EBS snapshot of the root volume. For more information, see Create Amazon EBS snapshots.

2.    Open the Amazon EC2 console.

Note: Be sure that you're in the correct Region.

3.    Select Instances from the navigation pane, and then choose the impaired instance.

4.    Choose Instance State, Stop instance, and then select Stop.

5.    In the Storage tab, under Block devices, select the Volume ID for /dev/sda1 or /dev/xvda.

Note: The root device differs by AMI, but /dev/xvda or /dev/sda1 are reserved for the root device. For example, Amazon Linux 1 and 2 use /dev/xvda. Other distributions, such as Ubuntu 14, 16, 18, CentOS 7, and RHEL 7.5, use /dev/sda1.

6.    Choose Actions, Detach Volume, and then select Yes, Detach. Note the Availability Zone.

Note: You can tag the EBS volume before detaching it to help identify it in later steps.

7.    Launch a rescue EC2 instance in the same Availability Zone.

Note: Depending on the product code, you might be required to launch an EC2 instance of the same OS type. For example, if the impaired EC2 instance is a paid RHEL AMI, you must launch an AMI with the same product code. For more information, see Get the product code for your instance.

If the original instance is running SELinux (RHEL, CentOS 7 or 8, for example), launch the rescue instance from an AMI that uses SELinux. If you select an AMI running a different OS, such as Amazon Linux 2, any modified file on the original instance has broken SELinux labels.

8.    After the rescue instance launches, choose Volumes from the navigation pane, and then choose the detached root volume of the impaired instance.

9.    Choose Actions, Attach Volume.

10.    Choose the rescue instance ID ( id-xxxxx), and then set an unused device. In this example, /dev/sdf.

11.     Use SSH to connect to the rescue instance.

12.    Run the lsblk command to view your available disk devices:

lsblk

The following is an example of the output:

NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
xvda    202:0     0   15G  0 disk
└─xvda1 202:1     0   15G  0 part /
xvdf    202:0     0   15G  0 disk
    └─xvdf1 202:1 0   15G  0 part

Note: Nitro-based instances expose EBS volumes as NVMe block devices. The output generated by the lsblk command on Nitro-based instances shows the disk names as nvme[0-26]n1. For more information, see Amazon EBS and NVMe on Linux instances. The following is an example of the lsblk command output on a Nitro-based instance:

NAME           MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
nvme0n1        259:0    0    8G  0 disk 
└─nvme0n1p1    259:1    0    8G  0 part /
└─nvme0n1p128  259:2    0    1M  0 part 
nvme1n1        259:3    0  100G  0 disk 
└─nvme1n1p1    259:4    0  100G  0 part /

13.    Run the following command to become root:

sudo -i

14.    Mount the root partition of the mounted volume to /mnt. In the preceding example, /dev/xvdf1 or /dev/nvme1n1p1is the root partition of the mounted volume. For more information, see Make an Amazon EBS volume available for use on Linux. Note, in the following example, replace /dev/xvdf1 with the correct root partition for your volume.

mount -o nouuid /dev/xvdf1 /mnt

Note: If /mnt doesn't exist on your configuration, create a mount directory, and then mount the root partition of the mounted volume to this new directory. 

mkdir /mnt
mount -o nouuid /dev/xvdf1 /mnt

You can now access the data of the impaired instance through the mount directory.

15.    Mount /dev, /run, /proc, and /sys of the rescue instance to the same paths as the newly mounted volume:

for m in dev proc run sys; do mount -o bind {,/mnt}/$m; done

Call the chroot function to change into the mount directory.

Note: If you have a separate /boot partition, mount it to /mnt/boot before running the following command.

chroot /mnt

Update the default kernel in the GRUB bootloader

The current corrupt kernel is in position 0 (zero) in the list. The last stable kernel is in position 1. To replace the corrupt kernel with the stable kernel, use one of the following procedures, based on your distro:

  • GRUB1 (Legacy GRUB) for Red Hat 6 and Amazon Linux
  • GRUB2 for Ubuntu 14 LTS, 16.04 and 18.04
  • GRUB2 for RHEL 7 and Amazon Linux 2
  • GRUB2 for RHEL 8 and CentOS 8

GRUB1 (Legacy GRUB) for Red Hat 6 and Amazon Linux 1

Use the sed command to replace the corrupt kernel with the stable kernel in the /boot/grub/grub.conf file:

sed -i '/^default/ s/0/1/' /boot/grub/grub.conf

GRUB2 for Ubuntu 14 LTS, 16.04, and 18.04

1.    Replace the corrupt GRUB_DEFAULT=0 default menu entry with the stable GRUB_DEFAULT=saved value in the /etc/default/grub file:

sed -i 's/GRUB_DEFAULT=0/GRUB_DEFAULT=saved/g' /etc/default/grub

2.    Run the update-grub command so that GRUB recognizes the change:

update-grub

3.    Run the grub-set-default command so that the stable kernel loads at the next reboot. In this example, grub-set-default is set to 1 in position 0:

grub-set-default 1

GRUB2 for RHEL 7 and Amazon Linux 2

1.    Replace the corrupt GRUB_DEFAULT=0 default menu entry with the stable GRUB_DEFAULT-saved value in the /etc/default/grub file:

sed -i 's/GRUB_DEFAULT=0/GRUB_DEFAULT=saved/g' /etc/default/grub

2.    Update GRUB to regenerate the /boot/grub2/grub.cfg file:

grub2-mkconfig -o /boot/grub2/grub.cfg

3.    Run the grub2-set-default command so that the stable kernel loads at the next reboot. In this example grub2-set-default is set to 1 in position 0:

grub2-set-default 1

GRUB2 for RHEL 8 and CentOS 8

GRUB2 in RHEL 8 and CentOS 8 uses blscfg files and entries in /boot/loader for the boot configuration, instead of the previous grub.cfg format. It's a best practice to use the grubby tool for managing the blscfg files and retrieving information from the /boot/loader/entries/. If the blscfg files are missing from this location or corrupted, grubby doesn't show any results. You must regenerate the files to recover functionality. Therefore, the indexing of the kernels depends on the .conf files located under /boot/loader/entries and on the kernel versions. Indexing is configured to keep the latest kernel with the lowest index. For information on how to regenerate BLS configuration files, see How can I recover my Red Hat 8 or CentOS 8 instance that is failing to boot due to issues with the Grub2 BLS configuration file?

1.    Run the grubby --default-kernel command to see the current default kernel:

grubby --default-kernel

2.    Run the grubby --info=ALL command to see all available kernels and their indexes:

grubby --info=ALL

The following is example output from the --info=ALL command:

root@ip-172-31-29-221 /]# grubby --info=ALL
index=0
kernel="/boot/vmlinuz-4.18.0-305.el8.x86_64"
args="ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 rd.blacklist=nouveau nvme_core.io_timeout=4294967295 crashkernel=auto $tuned_params"
root="UUID=d35fe619-1d06-4ace-9fe3-169baad3e421"
initrd="/boot/initramfs-4.18.0-305.el8.x86_64.img $tuned_initrd"
title="Red Hat Enterprise Linux (4.18.0-305.el8.x86_64) 8.4 (Ootpa)"
id="0c75beb2b6ca4d78b335e92f0002b619-4.18.0-305.el8.x86_64"
index=1
kernel="/boot/vmlinuz-0-rescue-0c75beb2b6ca4d78b335e92f0002b619"
args="ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 rd.blacklist=nouveau nvme_core.io_timeout=4294967295 crashkernel=auto"
root="UUID=d35fe619-1d06-4ace-9fe3-169baad3e421"
initrd="/boot/initramfs-0-rescue-0c75beb2b6ca4d78b335e92f0002b619.img"
title="Red Hat Enterprise Linux (0-rescue-0c75beb2b6ca4d78b335e92f0002b619) 8.4 (Ootpa)"
id="0c75beb2b6ca4d78b335e92f0002b619-0-rescue"
index=2
kernel="/boot/vmlinuz-4.18.0-305.3.1.el8_4.x86_64"
args="ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 rd.blacklist=nouveau nvme_core.io_timeout=4294967295 crashkernel=auto $tuned_params"
root="UUID=d35fe619-1d06-4ace-9fe3-169baad3e421"
initrd="/boot/initramfs-4.18.0-305.3.1.el8_4.x86_64.img $tuned_initrd"
title="Red Hat Enterprise Linux (4.18.0-305.3.1.el8_4.x86_64) 8.4 (Ootpa)"
id="ec2fa869f66b627b3c98f33dfa6bc44d-4.18.0-305.3.1.el8_4.x86_64"

Note the path of the kernel that you want to set as the default for your instance. In the preceding example, the path for the kernel at index 2 is /boot/vmlinuz- 0-4.18.0-80.4.2.el8_1.x86_64.

3.    Run the grubby --set-default command to change the default kernel of the instance:

grubby --set-default=/boot/vmlinuz-4.18.0-305.3.1.el8_4.x86_64

Note: Replace 4.18.0-305.3.1.el8_4.x86_64 with your kernel's version number.

4.    Run the grubby --default-kernel command to verify that the preceding command worked:

grubby --default-kernel

If you're accessing the instance using the EC2 Serial Console, then the stable kernel now loads and you can reboot the instance.

If you're using a rescue instance, then complete the steps in the following section.

Unmount volumes, detach the root volume from the rescue instance, and then attach the volume to the impaired instance

Note: Complete the following steps if you used Method 2: Use a rescue instance to access the root volume.

1.    Exit from chroot, and unmount /dev, /run, /proc, and /sys:

exit
umount /mnt/{dev,proc,run,sys,}

2.    From the Amazon EC2 console, choose Instances, and then choose the rescue instance.

3.    Choose Instance State, Stop instance, and then select Yes, Stop.

4.    Detach the root volume id-xxxxx (the volume from the impaired instance) from the rescue instance.

5.    Attach the root volume you detached in step 4 to the impaired instance as the root volume (/dev/sda1), and then start the instance.

Note: The root device differs by AMI. The names /dev/xvda or /dev/sda1 are reserved for the root device. For example, Amazon Linux 1 and 2 use /dev/xvda. Other distributions, such as Ubuntu 14, 16, 18, CentOS 7, and RHEL 7.5, use /dev/sda1.

The stable kernel now loads and your instance reboots.


AWS 官方
AWS 官方已更新 3 年前