Why is my EC2 Linux instance unreachable and failing its status checks?
My Amazon Elastic Compute Cloud (Amazon EC2) Linux instance is unreachable and fails one or both of its status checks.
Amazon EC2 uses two status checks to monitor the health of EC2 instances:
System status check
The system status check detects issues with an instance's underlying hardware. If the underlying hardware is unresponsive or unreachable because of network, hardware, or software issues, then the system status check fails.
Instance status check
An instance status check failure indicates that the instance is unreachable. The following common issues cause an instance status check failure:
- Failure to boot the operating system (OS)
- Failure to correctly mount the volumes
- Exhausted CPU and memory
- Kernel panic
- Network failure
Warning: Some of the following resolutions require an instance stop and start. Before stopping and starting your instance, note these conditions:
- Data that's stored in instance store volumes are lost when the instance is stopped. Before you stop the instance, make sure that you back up the data. Unlike Amazon Elastic Block Store (Amazon EBS)-backed volumes, instance store volumes are ephemeral and don't support data persistence.
- The static public IPv4 address that Amazon EC2 automatically assigned to the instance on launch or start changes after the stop and start. To retain a public IPv4 address that doesn't change when the instance is stopped, use an Elastic IP address.
For more information, see Prerequisites for stopping an instance.
To determine if the instance status check or system status check failed, view the instance's status check metrics.
If the system status check failed, then see My EC2 Linux instance failed its system status check. How do I troubleshoot this?
If the instance status check failed, then check the instance's system logs to determine the cause of the failure. Then, use one of the following resolutions to resolve the issue.
Failure to boot the OS
If the system logs contain boot errors, then see How do I troubleshoot an EC2 Linux instance that failed the instance status check due to operating system issues?
Failure to correctly mount the volumes
Mount point failure might cause the instance status check to fail.
Example mount point failure:
[FAILED] Failed to mount / See 'systemctl status mnt-nvme0n1p1.mount' for details. [DEPEND] Dependency failed for Local File Systems.
For more information, see the following AWS Knowledge Center articles:
- "Dependency failed" errors: Why is my EC2 Linux instance going into emergency mode when I try to boot it?
- "Failed to mount" or "Dependency failed" errors: How do I troubleshoot an EC2 Linux instance that failed the instance status check due to operating system issues?
When you change an instance type from Xen to Nitro, the volume mount might fail. Mount failure occurs because Amazon EBS volumes are exposed as NVMe block devices on Nitro-based instances. The device names are /dev/nvme0n1, /dev/nvme1n1, and so on. Device names that you specify in a block device mapping are renamed to NVMe device names (/dev/nvme[0-26]n1). The block device driver might assign the NVMe device names in a different order from the original order that you specified in the block device mapping. To avoid mount failure on Nitro-based instances, it's a best practice to use either a label or UUID for device names. For more information, see Make an Amazon EBS volume available for use on Linux.
Exhausted CPU and Memory
High CPU Utilization
If the CPUUtilization metric is at or near 100%, then the instance might not have enough compute capacity to run the kernel.
For T2 or T3 instances, check the Amazon CloudWatch CPU credit metrics to determine if the UPC credits are at or near zero. If the CPU credits are at zero, then the CPUUtilization metric shows a saturation plateau at the baseline performance for the instance. The baseline performance might be 20%, 40%, and so on, depending on the instance type.
CPU utilization at or near 100%, or at a saturation plateau for T2 or T3 instances, indicates that the status check failed because of resource over utilization. To troubleshoot this issue, see My EC2 Linux instance failed the instance status check due to over-utilization of its resources. How do I troubleshoot this?
Block device errors, software bugs, or kernel panic might cause an unusual CPU usage spike. If CPU Utilization is at 100%, then check the system logs for block device or memory issue errors or other unusual system errors. Then, reboot or stop and start the instance.
Out of memory
High memory pressure might cause an instance status check failure. In the following example log entry, the OS is out of memory. To resolve this error, stop the process that's consuming the most memory.
[115879.769795] Out of memory: kill process 20273 (httpd) score 1285879 or a child [115879.769795] Killed process 1917 (php-cgi) vsz:467184kB, anon-rss:101196kB, file-rss:204kB
By default, EC2 instance memory and disk metrics aren't sent to Amazon CloudWatch. However, you can use the CloudWatch agent to collect and monitor additional metrics.
To troubleshoot and resolve the out of memory issue, upgrade the instance to a larger instance type. Or, add swap storage to the instance to alleviate the memory pressure. For more information, see the following AWS Knowledge Center articles:
- How do I allocate memory to work as swap space in an Amazon EC2 instance by using a swap file?
- How do I use a partition on my hard drive to allocate memory to work as swap space on an Amazon EC2 instance?
Disk full errors
If the system logs contain disk full errors, then the instance is in emergency mode because of a full root device.
Example system log:
$: service apache2 restart Error: No space left on device $: /etc/init.d/mysql restart [....] Restarting mysql (via systemctl): mysql.serviceError: No space left on device root@example:~# df -h / Filesystem Size Used Avail Use% Mounted on /dev/root 7.7G 7.7G 0 100% /
For detailed instructions on how to troubleshoot and resolve disk full errors, see the following AWS Knowledge Center articles:
- My EC2 Linux instance failed the instance status check due to over-utilization of its resources. How do I troubleshoot this?
- How do I increase the size of my EBS volume if I receive an error that there's no space left on my file system?
Kernel panic occurs when the kernel detects an internal fatal error during operation. If the error occurs during the OS boot, then the kernel might not load properly. This causes an OS boot failure.
Example kernel panic error message:
Linux version 2.6.16-xenU (email@example.com) (gcc version 4.0.1 20050727 (Red Hat4.0.1-5)) #1 SMP Mon May 28 03:41:49 SAST 2007 Kernel command line: root=/dev/sda1 ro 4 Registering block device major 8 Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,1)
For information on how to troubleshoot and resolve a kernel panic error, see the following AWS Knowledge Center articles:
- Why do I see a "Kernel panic" error after I upgrade the kernel or reboot my EC2 Linux instance?
- How do I revert to a known stable kernel after an update prevents my Amazon EC2 instance from rebooting successfully?
The following common reasons might cause your network to fail.
The cloud-init package isn't installed on the instance
The cloud-init package is used to update network configurations at launch.
To correct this error, run the following command to install the cloud-init package on your instance:
$ sudo yum install cloud-init
MAC address is hardcoded in a configuration file
Hardcoded MAC addresses are in the Linux configuration files and the udev configuration files. These files are usually in the following locations:
To resolve network issues caused by a hardcoded MAC address, remove the entries or configuration files. For example, run the following command:
The IP address is hardcoded in a configuration file
When you create an Amazon Machine Image (AMI) from an instance with a statically configured IP address, the configuration file might contain a hardcoded IP address.
To correct this error, set your network interface to use DHCP.
Note: You can't update existing AMIs. You must set the network interface to use DHCP before you create a new AMI.
There are missing ENA or Intel-enhanced network drivers
For more information on missing Elastic Network Adapters (ENAs) or Intel-enhanced network drivers, see Enhanced networking on Linux.
The network interface is renamed at startup
To fix this issue, add net.ifnames=0 to the kernel command line to deactivate predictable network interface names. To do the variable, you must activate enhanced networking with the ENA.
For more information on network issues, see Best practices for configuring network interfaces.
- rePost-User-4744009lg...asked 19 days agolg...
- Guilhermelg...asked 2 months agolg...
- rePost-User-0388898lg...asked a year agolg...
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 7 months ago
- [🚀Launch Announcement] - AWS Network Load Balancer (NLB) introduces a new target health status while draining connectionsEXPERTmilindku-AWSlg...published a month agolg...