Skip to content

Why did my Amazon EC2 instance automatically or unexpectedly stop?

10 minute read
2

My Amazon Elastic Compute Cloud (Amazon EC2) instance automatically or unexpectedly stopped.

Resolution

Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.

Identify why the EC2 instance stopped

Check the instance's StateReason code

To quickly identify why your instance stopped, run the following describe-instances AWS CLI command:

aws ec2 describe-instances --instance-ids i-1234567890abcdef0 --query "Reservations[].Instances[].{StateReason:StateReason}" --output json

Note: Replace i-1234567890abcdef0 with your instance ID.

In the output, check the StateReason value to identify why the instance stopped.

Check the instance's Event history in CloudTrail

Check the AWS CloudTrail Event history for the StopInstances event to get more information about why the instanced stopped.

On the Event history page, check the following values:

  • Check eventTime to find the exact time when the stop action occurred.
  • Check userIdentity for the AWS Identity and Access Management (IAM) user, role, or service that initiated the stop.
  • Check userAgent to find the tool or service that the user used to make the API call, such as the AWS CLI or AWS Lambda.
  • Check the requestParameters to find the instances that stopped.

Troubleshoot the Client.UserInitiatedShutdown StateReason

If the StateReason is Client.UserInitiatedShutdown, then use the CloudTrail console to identify the user that initiated the stop action.

Or, run the following lookup-events command:

aws cloudtrail lookup-events \
 --lookup-attributes AttributeKey=ResourceName,AttributeValue=i-1234567890abcdef0 \
 --start-time "starttime" \
 --end-time "endtime" \
 --query "Events[?EventName=='StopInstances']"

Note: Replace i-1234567890abcdef0 with your instance ID, startime with the start time of when you want to pull data from, and endtime with the end time.

Check userIdentity for the IAM user, role, or service that initiated the stop. If the userIdentity value is an IAM user from your AWS account, then a user in your account manually stopped the instance. If the userIdentity value is an IAM role, then an automated process that uses the role stopped the instance.

To prevent future unexpected instance stops, remove the ec2:StopInstances permissions from IAM users and roles that aren't authorized to stop instances.

Troubleshoot the Client.InstanceInitiatedShutdown StateReason

If the StateReason is Client.InstanceInitiatedShutdown, then the instance's operating system (OS) issued a shutdown or halt command. OS-initiated shutdowns bypass AWS APIs, so they don't generate StopInstances events in CloudTrail.

Start your instance, and then retrieve the instance console output. In the output, check for kernel panics, Out of Memory (OOM) messages, or shutdown sequences.

To identify what caused the instance to stop, run the following commands to check the logs based on your OS.

Linux:

# Check for shutdown/halt/reboot commands in auth logs
grep -i "shutdown\|halt\|poweroff\|reboot" /var/log/auth.log /var/log/secure 2>/dev/null

# Check system journal for previous boot's final messages
journalctl --list-boots
journalctl -b -1 -r | head -100

# Check for OOM killer events
journalctl -b -1 | grep -i "out of memory\|oom-killer"
dmesg | grep -i "oom\|killed process"

# Check for kernel panic
journalctl -b -1 | grep -i "kernel panic\|BUG:"

# Check who/what initiated shutdown
last -x shutdown reboot | head -10

# Check if a scheduled shutdown was set
cat /run/systemd/shutdown/scheduled 2>/dev/null

Windows:

# Check Windows Event Log for shutdown events
# Event 1074 = user/process initiated shutdown
# Event 6006 = clean shutdown
# Event 6008 = unexpected shutdown (crash/BSOD)
Get-WinEvent -FilterHashtable @{LogName='System'; ID=1074,6006,6008} | Select-Object -First 10 | Format-List

# Check for BugCheck (BSOD) events
Get-WinEvent -FilterHashtable @{LogName='System'; ProviderName='Microsoft-Windows-WER-SystemErrorReporting'} -ErrorAction SilentlyContinue

To check for automated OS-level shutdown triggers, run the following commands:

# Check for shutdown scheduled via cron
crontab -l | grep -i "shutdown\|halt\|poweroff\|reboot"
sudo crontab -l | grep -i "shutdown\|halt\|poweroff\|reboot"

# Check systemd timers
systemctl list-timers --all | grep -i "shutdown\|reboot"

# Check if unattended-upgrades triggered a reboot (Debian/Ubuntu)
cat /var/log/unattended-upgrades/unattended-upgrades-shutdown.log 2>/dev/null

# Check watchdog configuration
systemctl status watchdog 2>/dev/null

To troubleshoot kernel panic or OOM issues, proceed to Troubleshoot instances with high resource usage.

Troubleshoot the Server.SpotInstanceTermination StateReason

If your instance is a Spot Instance and the StateReason is Server.SpotInstanceTermination, then Amazon EC2 reclaimed the capacity. For more information, see Why did Amazon EC2 interrupt my Spot Instance?

To work around Spot Instance interruptions, take the following actions:

  • Use a diversified fleet strategy across multiple instance types and Availability Zones.
  • Use Spot Instance interruption notices to gracefully manage shutdowns.
  • Use Capacity Rebalancing to proactively replace at-risk Spot Instances.
  • For workloads that can't be interrupted, use On-Demand or Reserved Instances instead.

Troubleshoot automated processes that stopped your instance

The userAgent field in CloudTrail shows whether a Lambda function, scheduled script, or other automated process stopped your instance. To stop or modify an automation that stopped your instance, take the following actions based on the source.

Update Lambda functions

If the userAgent field shows a Lambda function such as ssmApplicationInstancesToggle, then update the Lambda function and check the following configurations:

  • Check the function code for stop logic.
  • Check the function's triggers, such as Amazon EventBridge rules, schedules, or other event sources.
  • Modify the function logic to exclude instances that you don't want to stop.
  • Adjust the schedule.

Or, delete the function if you no longer need it.

Update Amazon EC2 Auto Scaling scale-in events

If your instance is in an Amazon EC2 Auto Scaling group, then the Auto Scaling group might terminate the instance during a scale-in event.

To check whether the instance belongs to an Auto Scaling group, run the following describe-auto-scaling-instances command:

aws autoscaling describe-auto-scaling-instances \
 --instance-ids i-1234567890abcdef0

Note: Replace i-1234567890abcdef0 with your instance ID.

To identify what caused the instance to stop, check the instance's scaling activities.

To make sure that a scale-in activity doesn't cause the instance to stop, activate instance scale-in protection.

Update scheduled scripts or cron jobs

If userAgent is an instance that used the AWS CLI with temporary credentials from the Instance Metadata Service (IMDS), then an instance script stopped the instances.

To identify the script, connect to the instance that's listed in CloudTrail.

For Windows instances, run the following command to check Task Scheduler:

Get-ScheduledTask | Where-Object {$_.State -ne "Disabled"} | Select-Object TaskName, TaskPath, State

For Linux instances, run the following commands to check cron jobs for all users:

# Check current user's cron jobs
crontab -l

# Check root user's cron jobs
sudo crontab -l

# List all users' cron jobs
for user in $(cut -f1 -d: /etc/passwd); do echo "Cron jobs for $user:"; sudo crontab -u $user -l 2>/dev/null; done 

Then, run the following command to review the cron logs to identify the user and script that ran during the time of the stop event:

sudo cat /var/log/cron | grep "time-of-stop-event"

Note: Replace time-of-stop-event with the time when the instance stopped.

After you identify the cron job or scheduled task that stopped the instance, take one of the following actions:

  • Modify the script logic so that it doesn't stop instances.
  • Adjust the schedule.
  • Remove the script.

If you can't identify the automation that stopped the instance, then update the instance that's listed in CloudTrail so that it can't stop instances. Modify the instance's IAM role permissions policy to remove the ec2:StopInstances and ec2:StartInstances permissions.

Check whether there was a scheduled maintenance event

AWS periodically performs maintenance on the underlying hardware that hosts instances.

To check whether a scheduled maintenance event occurred, run the following describe-instance-status command:

aws ec2 describe-instance-status \
 --instance-ids i-1234567890abcdef0 \
 --include-all-instances \
 --query "InstanceStatuses[*].Events"

Note: Replace i-1234567890abcdef0 with your instance ID.

For more information, see How do I manage and reschedule Amazon EC2 instance maintenance events?

When AWS schedules a maintenance event, you receive an alert before the event's date and time. If you didn't receive email notifications about scheduled maintenance, then take the following actions:

  • Make sure that the primary email address that's associated with your account is correct.
  • Check spam or junk folders for AWS notifications.
  • Set up proactive notifications through the AWS Health Dashboard to receive alerts through multiple channels.

Identify automatic recovery actions

System status checks detect issues with the underlying host hardware or AWS infrastructure. If your instance failed a system status check, then Amazon EC2 runs EC2 Auto Recovery. By default, EC2 Auto Recovery is activated on instance types that support automatic recovery. To check whether an automatic recovery action stopped your instance, see Verify if automatic instance recovery occurred.

In CloudTrail, the invokedBy value is monitoring.amazonaws.com for events that occurred because of automatic recovery.

Example event:

{
  "eventSource": "ec2.amazonaws.com",
  "eventName": "StopInstances",
  "userIdentity": {
    "invokedBy": "monitoring.amazonaws.com"
  }
}

Important: Instance status check failures don't result in automatic recovery. Only system status checks do.

For more information about status checks, see How do I troubleshoot status check failures on my Amazon EC2 instance?

Troubleshoot instances with high resource usage

If your instance's resource usage is high, then the instance might become unresponsive. If you configured an automation to stop unresponsive instances, then the automation stops the instance. If out-of-memory issues result in kernel panic errors, then the OS might also initiate a shutdown.

If your instance stopped during a period of high resource usage, then check your Amazon CloudWatch alarms, EventBridge rules, and AWS Systems Manager automation. Identify whether one of the automations caused the instance to stop.

To troubleshoot high resource usage issues, see the following AWS Knowledge Center troubleshooting articles:

Update your configuration to get notifications about instance stops

Note: It's a best practice to activate stop protection for instances that must remain running. Stop protection makes sure that APIs can't stop the instance.

To manage and monitor your instance state, configure the following settings:

Related information

Why did Amazon EC2 unexpectedly terminate my instance?

Why did my Amazon EC2 Linux instance reboot or restart itself?

AWS OFFICIALUpdated 21 days ago