AWS instance randomly becomes unresponsive


My AWS instance randomly becomes unresponsive everyday. I won't be able to ping it and all ports on the public IP are inaccessible but it shows the instance is running from the AWS dashboard. The only way to fix it is by rebooting but I don't want to have to do this every day.

The instance reachability check fails but the system status check doesn't status check

The CPU utilization is not even high so I know it isn't crashing or something. cpu utilization

The system log doesn't show anything wrong either (pastebin: My machine type doesn't have EC2 serial console so I can't access that either.

3 Answers

The thing that sticks out at me here is the CPU credit balance graph. Note that it goes to zero at 23:15. This is going to cause the performance of the instance to drop to baseline which may be impacting your availability.

You can solve this by choosing "Unlimited" mode for the instance. Whenver the CPU credit balance goes to zero the performance will not be impacted; but you will be charge for that service. If this is happening regularly a better option would be to use a different instance type such as the M, R or C instance types.

profile picture
answered 13 days ago

The best recommendation would be to create a support ticket and they can help with this.

profile picture
answered 13 days ago


In addition to Brettski's answer of switching to "unlimited mode" I would also advise to check memory usage of the instance, from our documentation [1] common reasons for StatusCheckFailed_Instance are as following:

  - Failed system status checks
  - Incorrect networking or startup configuration
  - Exhausted memory
  - Corrupted file system
  - Incompatible kernel

Based off the symptoms you described and your approach to fix it sounds a lot like a memory issue, see docs [2] that states Reboot the instance to return it to a non-impaired status. The problem will probably occur again unless you change the instance type

As you know that EC2 does not natively monitor memory metrics, we have 2 approaches to configure this manually adding them below with links:

  1. Configure ATOP/SAR to log resource utilization
  2. Use cloudwatch agent to log memory utilization

In addition to the resources above also adding this third party link [3] which does a good job of summarizing and configuring the aws agent to log memory utilization.

I do hope the above resources help, feel free to comment on this thread if there is anything else we can assist with.

[1] Instance status checks

[2] Suggested actions

[3] How to monitor memory usage on AWS EC2?

profile picture
answered 12 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions