How do I troubleshoot an Amazon EC2 instance that stops or terminates when I try to start it with the "InternalError" or "Client.UserInitiatedShutdown" error?

8 minute read
0

When I try to start my Amazon Elastic Compute Cloud (Amazon EC2) instance, it terminates or doesn't start and I received the "InternalError" or "Client.UserInitiatedShutdown" error.

Short description

The following reasons are common causes of an Amazon EC2 instance "InternalError" or "Client.UserInitiatedShutdown" error:

  • Your Amazon Elastic Block Store (Amazon EBS) volume isn't attached to the instance correctly.
  • An EBS volume that's attached to the instance is in an error state.
  • An encrypted EBS volume is attached to the instance and the entity doesn't have permissions to the AWS Key Management Service (AWS KMS) key.
  • An Amazon EC2 instance was stopped a few minutes after it was started by an operating system disruption such as the audit daemon.

Note:

Resolution

EBS volumes aren't attached to the instance correctly

You must attach the EBS root volume to the instance as /dev/sda1 or /dev/xvda, based on which one is defined in the API. You can't have a second EBS volume with a duplicate device name or a name that conflicts. When this happens, you can't stop or start the instance. Block device name conflicts affect only Xen-based instance types (c4, m4, t2, and so on). Block device name conflicts don't affect Nitro-based instances (c5, m5, t3, and so on).

  1. To verify the StateReason error message and error code, run the describe-instances API:

    $ aws ec2 describe-instances --instance-id i-nnnnnnnnnnnnnnn --region us-east-1 --query "Reservations[].Instances[].{StateReason:StateReason}" --output json

    Note: Replace us-east-1 with your AWS Region. Replace i-nnnnnnnnnnnnnnn with your instance ID.

    If there's a device name conflict, then you see an output that's similar to the following message:

    [    [{
            "StateReason": {
                "Code": "Server.InternalError",
                "Message": "Server.InternalError: Internal error on launch"
            }
        }]
    ]
  2. Open the Amazon EC2 console, and then select the instance that you can't start.

  3. On the Description tab, verify the device name that's listed in Block devices. The Block devices field displays all the device names of the attached volumes.

  4. Verify that the root device is correctly attached and that there isn't a device listed with the same name or a name that conflicts.

  5. If there's a device with a duplicate or device name that conflicts, first detach the volume and rename it. Then, reattach the volume with the updated device name.

An attached EBS volume is in an error state

  1. Run the describe-instances API to verify the StateReason error message and error code:

    $ aws ec2 describe-instances --instance-id i-nnnnnnnnnnnnnnn --region us-east-1 --query "Reservations[].Instances[].{StateReason:StateReason}" --output json

    Note: Replace us-east-1 with your AWS Region. Replace i-nnnnnnnnnnnnnnn with your instance ID.

    If there's an attached EBS volume that's in an error state, then you see an output that's similar to the following message:

    [    [{
            "StateReason": {
                "Code": "Server.InternalError",
                "Message": "Server.InternalError: Internal error on launch"
            }
        }]
    ]
  2. Open the Amazon EC2 console, choose Volumes, and then verify if the status of the volume is error. Your options vary on whether the volume is a root volume or a secondary volume.

  3. If the volume that's in an error state is a secondary volume, detach the volume, and then start the instance.

  4. If the volume that's in an error state is a root volume and you have a snapshot of the volume, then complete the following steps:
    Detach the volume.
    Create a new volume from the snapshot.
    Use the device name of the original instance to attach the new volume, and then start the instance.

Note: If you don't have an existing snapshot of the root volume that's in an error state, then you can't restart the instance. You must launch a new instance, install the relevant applications, and then configure the new instance to replace the old instance.

Attached EBS volumes are encrypted and IAM permissions to access AWS KMS keys are insufficient

To resolve this issue, follow these steps to check the AWS Identity and Access Management (IAM) permissions.

  1. Run the describe-instances API to verify the StateReason error message and error code:

    $ aws ec2 describe-instances --instance-id i-nnnnnnnnnnnnnnn --region us-east-1 --query "Reservations[].Instances[].{StateReason:StateReason}" --output json

    Note: Replace us-east-1 with your AWS Region. Replace i-nnnnnnnnnnnnnnn with your instance ID.

    If there's an encrypted volume that's attached to the instance and there are permissions or policy issues, then you receive a client error. You see an output that's similar to the following message:

    [    [{
            "StateReason": {
                "Code": "Client.InternalError",
                "Message": "Client.InternalError: Client error on launch"
            }
        }]
    ]

Verify that the user who tried to start the instance has the correct IAM permissions. If you launched the instance indirectly through another service, such as EC2 Auto Scaling, then also verify the following configurations:

Note: To verify if a volume is encrypted, open the Amazon EC2 console, and then select Volumes. Encrypted volumes show the label Encrypted in the Encryption column.

EC2 instance shutdown because of operating system disruptions such as the audit daemon

Verify the EC2 instance shutdown reason error code. If the EC2 instance shut down because of an OS error such as "Client.UserInitiatedShutdown", create a rescue instance. Then, follow these troubleshooting steps.

Verify the EC2 instance shutdown reason

To verify why the EC2 instance shut down, run the following command:

aws ec2 describe-instances --instance-ids i-xxxx --query 'Reservations[*].Instances[].StateReason'

Example output:

[
    {
        "Code": "Client.UserInitiatedShutdown",
        "Message": "Client.UserInitiatedShutdown: User initiated shutdown"
    }
]

Note: It's a best practice to initiate shutdown from the OS level.

Create a rescue instance

Follow these steps to create a rescue instance. Because the instance is stopped, you must have the rescue instance to access parameters in auditd.conf.

  1. Open the Amazon EC2 console.

  2. Choose Instances from the navigation pane, and then choose the impaired instance.

  3. Stop the instance.

  4. Detach the Amazon EBS volume /dev/xvda from the stopped instance.

  5. Launch a new EC2 instance in the same Availability Zone as the impaired instance. The new instance becomes your rescue instance.

  6. Attach the Amazon EBS volume that you detached in step 4 to the rescue instance as a secondary device.

  7. Use SSH to connect to your rescue instance.

  8. Mount the volume at /mnt with the following command:

    $ sudo mount /dev/xvdf /mnt/

    Note: If the /mnt/var/log directory is empty or missing, verify that the /mnt/etc/fstab entry exists. Then, mount the required partition for /var/log or /var/log/audit by following step 8.

Verify the OS level logs

to verify the OS level logs for the audit daemon, run the following commands:

RPM-based operating systems:

> cat /var/log/messages | grep -i "Audit daemon"

Debian-based operating systems:

> cat /var/log/syslog | grep -i "Audit daemon"

The following output indicates that the disk space is low for the audit daemon and that the instance is stopped:

auditd[1009]: Audit daemon is low on disk space for logging
auditd[1009]: The audit daemon is now halting the system

Usually a different partition is mounted for /var/log or /var/log/audit, and these partitions become full while the root partition is free of disk space. In this scenario, the "No space left on device" error doesn't occur, but the instance doesn't start up.

Verify disk space

To verify disk space, run the following command:

> df -hT

Check the audit daemon action parameters

If disk space is full, then check the audit daemon action under /etc/auditd/auditd.conf for the following parameters:

admin_space_left

This value defines the minimum value in megabytes for free disk for the audit daemon to perform an action for low disk space.

admin_space_left_action

This parameter defines the action that the audit daemon performs when disk space is low. The valid values are ignore, syslog, rotate, email, exec, suspend, single, and halt.

After you change the audit daemon action, attach the Amazon EBS volume back to the instance as /dev/sda1, and then start the instance.

For more information on how to change these parameters, see auditd.conf on the man7 website.

Note: You can also increase the Amazon EBS volume partition size.

Related information

Why can't I start or launch my EC2 instance?

Key policies in AWS KMS

Instance terminates immediately

AWS OFFICIAL
AWS OFFICIALUpdated 2 months ago