Saltar al contenido

How do I troubleshoot status check failures on my Amazon EC2 Windows instance?

8 minutos de lectura
0

My Amazon Elastic Compute Cloud (Amazon EC2) Windows instance is unreachable and fails its status checks.

Short description

Amazon EC2 uses three status checks to monitor the health of EC2 instances:

Resolution

To identify the status check that failed, view the instance's status check metrics. To troubleshoot system status check or attached EBS status check failures, see How do I troubleshoot status check failures on my Amazon EC2 instance?

If the instance status check failed, then check the instance's system logs to see the cause of the failure. Then, take the following actions based on the issue that you encounter.

Important: Some of the following resolutions require you to stop and start the instance.

Configure your instance for a stop and start

Note: When you stop and start an instance, the instance's public IP address changes. It's a best practice to use an Elastic IP address to route external traffic to your instance instead of a public IP address. If you use Amazon Route 53, then you might need to update the Route 53 DNS records when the public IP address changes. A stop and start is different from an instance reboot. For more information, see How EC2 instance stop and start works.

Before you stop and start your instance, take the following actions:

Troubleshoot an OS that fails to boot

Troubleshoot issues with your EC2 instance based on the screenshot that you receive. If you see the blue screen (0xc000000e), then see Disk signature collision.

If you receive a "Stuck Windows Update (error C0000034)" error, then complete the following steps to troubleshoot:

  1. Launch a rescue instance.
  2. Stop the original instance.
  3. Detach the root volume from the original instance.
  4. Attach the volume to the rescue instance.
  5. Run the following command on the rescue instance to remove pending actions:
    DISM /image:D:\ /cleanup-image /revertpendingactions
    Note: Replace D: with your volume's letter drive.
    If you don't see the attached volume in the command's output, then make sure that the volume is online. Run the following command to open Disk Management:
    diskmgmt.msc
    Then, open the context (right-click) menu for the volume, and then choose Online. Then, rerun the cleanup-image command.
  6. Unmount and detach the volume from the rescue instance.
  7. Attach the volume to the original instance as the /dev/sda1 root volume.
  8. Start the original instance.

If you see 0xc000014c, 0xc000000f, or "Windows failed to start" errors, then the issue might be system file or registry corruption. To resolve this issue, use EC2Rescue for Windows to restore to the Last Known Good Configuration. If you still encounter issues, then restore the instance from a recent Amazon Machine Image (AMI) or Amazon EBS snapshot.

Troubleshoot EBS volumes that didn't mount correctly

Complete the following steps:

  1. Launch a rescue instance.

  2. Stop the original instance.

  3. Detach the root volume from the original instance.

  4. Attach the volume to the rescue instance.

  5. Run the following command to open Disk Management:

    diskmgmt.msc
  6. If the attached volume is Offline, then open the context (right-click) menu for the volume, and then choose Online.

  7. If a disk signature collision occurs, then run EC2Rescue for Windows to detect and fix the issue.

  8. Make sure that the instance has the AWS Non-Volatile Memory Express (NVMe) driver installed.

  9. Unmount and detach the volume from the rescue instance.

  10. Attach the volume to the original instance.

  11. Start the original instance.

Troubleshoot high CPU and memory

Check the CPUUtilization metric in Amazon CloudWatch. If the value is at or near 100%, then the instance doesn't have enough compute capacity. For burstable instances such as T2 and T3, also check the CPUCreditBalance metric. If CPU credits reach zero, then Amazon EC2 caps the instance at its baseline performance level. For example, if the baseline is 20%, then the instance doesn't exceed 20% CPU usage.

To check memory usage, use the CloudWatch agent. Or, connect to the instance through EC2 Serial Console or Session Manager, a capability of AWS Systems Manager. Then, run the following command to identify the top memory-consuming processes:

Get-Process | Sort-Object WorkingSet64 -Descending | Select-Object -First 10 Name, @{N='MemMB';E={[math]::Round($_.WorkingSet64/1MB)}}

If CPU usage is near 100% or memory is exhausted, then stop and start the instance.

If the issue persists, then change to a larger instance type or switch your burstable instances to unlimited mode.

Troubleshoot full disk issues

Complete the following steps:

  1. Launch a rescue instance.
  2. Stop the original instance.
  3. Detach the root volume from the original instance.
  4. Attach the volume to the rescue instance.
  5. To free disk space, remove unnecessary files, clear temporary folders, and clean up Windows Update files.
  6. If you still require more disk space, then modify the EBS volume to increase its size. Then, run the following command to extend the partition:
    $MaxSize = (Get-PartitionSupportedSize -DriveLetter D).SizeMax
     Resize-Partition -DriveLetter D -Size $MaxSize
    Note: Replace D with the volume's letter drive.
  7. Unmount and detach the volume from the rescue instance.
  8. Attach the volume to the original instance as a root volume.
  9. Start the original instance.

Troubleshoot network configuration failures

Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.

Even if the screenshot shows the login screen (Ctrl+Alt+Del), the instance might still fail its status check. In this scenario, check the Network status icon for the Limited connectivity or No internet status to identify network issues.

To automatically check and fix network configurations that cause Remote Desktop Protocol (RDP) issues, run the following start-automation-execution AWS CLI command to start AWSSupport-TroubleshootRDP:

aws ssm start-automation-execution \
   --document-name "AWSSupport-UpgradeWindowsAWSDrivers" \
   --parameters "InstanceId=i-1234567890abcdef0"

Note: Replace i-1234567890abcdef0 with your instance ID.

Or, to manually resolve network configuration issues, complete the following steps:

  1. Create a secondary elastic network interface.

  2. Attach the secondary network interface to the instance.

  3. Use RDP to connect to the instance with the IP address of the secondary network interface.

  4. Run the following command to open Network Connections:

    ncpa.cpl
  5. If the primary network adapter is Disabled, then open the context (right-click) menu for the adapter.

  6. Choose Enable.

  7. Make sure that Windows Firewall allows port 3389. Also, run the following command to verify that Remote Desktop Services is running:

    Get-Service -Name TermService

    If the service is stopped, then run the following command to start it:

    Start-Service -Name TermService

(Nitro instances only) Troubleshoot driver incompatibility

If RDP fails after you migrated a Xen-based instance to an instance that's built on the AWS Nitro System, then the issue is driver incompatibility. To resolve this issue, upgrade your instance's drivers.

Note: You can't use Paravirtual (PV) drivers on Nitro instances. You must install the Elastic Network Adaptor (ENA) and NVMe drivers before you change the instance type.

Related information

Troubleshoot issues with Amazon EC2 Windows instances

Troubleshoot impaired Amazon EC2 Windows instance using EC2Rescue

OFICIAL DE AWSActualizada hace 23 días