Skip to content

EC2 Instance reachability check failed - 3 times in 3 months

0

In the last 3 months I have had to reboot our instance 3 times due to the Instance reachability check failed error. When I reboot, the problem goes away for a while, until suddenly our app breaks and I have to log in to reboot again. I honestly don't know what to look for in the logs to find out why this keep happening. I have a log file, but don't see a way to upload it here, so I hosted it on Dropbox.

asked a year ago322 views
3 Answers
1

What instance type is it and what kind of workload is it running?

One place to start looking would be under the Monitoring tab of the EC2 instance in AWS Console, and note if there is anything that looks concerning, e.g. if CPU usage becomes unusually high.

It would also be worthwhile setting up CloudWatch agent to collect more detailed system logs https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html - this may show that your root cause is exhaustion of some system resources.

EXPERT
answered a year ago
0

The recurring Instance reachability check failures you're experiencing with your EC2 instance are concerning and indicate an underlying issue that needs to be addressed. While rebooting the instance provides a temporary fix, it's important to identify and resolve the root cause to prevent future disruptions.

To troubleshoot this issue, you should focus on the following areas:

  1. System logs: Review the system logs for any error messages, kernel panics, or other indicators of system instability. These logs can provide valuable insights into what's causing the reachability issues.

  2. Resource utilization: Monitor your instance's CPU, memory, and disk usage. Overutilization of these resources can lead to instance unresponsiveness.

  3. Application logs: Examine your application logs for any errors or unusual behavior that might be causing the instance to become unreachable.

  4. Network configuration: Verify that your network settings, security groups, and VPC configurations are correct and not causing connectivity issues.

  5. Instance type and size: Ensure that your current instance type and size are adequate for your workload. If your application is resource-intensive, you may need to consider upgrading to a larger instance type.

  6. EBS volume performance: Check if your EBS volumes are performing as expected and not causing I/O bottlenecks.

To prevent future issues, consider implementing the following best practices:

  1. Set up Amazon CloudWatch alarms to monitor your instance's health and performance metrics. This can help you detect and respond to issues before they cause downtime.

  2. Use EC2 Auto Recovery to automatically recover your instance if it becomes impaired due to underlying hardware issues.

  3. Implement a more robust monitoring and logging solution to capture detailed information about your instance and application performance.

  4. Regularly review and optimize your application code and infrastructure to ensure efficient resource utilization.

If you continue to experience issues after implementing these measures, it may be helpful to engage AWS Support for further assistance in diagnosing and resolving the problem.
Sources
Troubleshoot EC2 Windows instance status check failures | AWS re:Post
Understand why an EC2 instance is unreachable before a degredation dedaline | AWS re:Post
Troubleshoot issues with Amazon EC2 instances - Amazon Elastic Compute Cloud
Troubleshoot status check failures on an unreachable EC2 Linux instance | AWS re:Post

answered a year ago
0

OK, thanks for responding! It's an Ubuntu system. I have looked back 12 months and there is nothing indicating that the system is running out of resources. Highest CPU is 36%. I have had no training and am flying by the seat of my pants here. ChatGPT and I are trying to figure this out. I was trying to set up a CloudWatch alert, but it says my instance's SSM Agent status is offline. Working on that ...

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.