Skip to content

EC2 failing health check ( Instance reachability check failed )

0

We have multiple EC2 instance and two of them has 1/2 Status Check, before we have some issue that comes up everyday in the morning. Once we start the server it keeps working until the end of the day. And the next day its starts again fail the status check. Problem is before this issue was happening everyday and daily we restart the server to solve it. Then we decided to create image of the server and recreate servers on top of those AMIs. But the problem is same again daily is fails the status check on restarting it gets fixed and comes up again the next day.

asked 10 months ago295 views
1 Answer
0

The issue you're experiencing with EC2 instances failing health checks (Instance reachability check failed) on a daily basis is concerning and indicates an underlying problem that needs to be addressed. The fact that restarting the server temporarily fixes the issue, but it recurs the next day, suggests a systemic problem rather than a one-time glitch.

An instance status check failure (1/2 Status Check) typically indicates that the instance is unreachable. This can be caused by several factors:

  1. Failure to boot the operating system
  2. Failure to correctly mount the volumes
  3. Exhausted CPU and memory resources
  4. Kernel panic
  5. Network failure

Since recreating the servers from AMIs didn't resolve the issue, it's likely that the problem is related to something that occurs during the daily operation of the instances, rather than an issue with the initial configuration.

Here are some steps you can take to troubleshoot and potentially resolve the issue:

  1. Check system logs: Examine the system logs for any errors or warnings that occur before the instance becomes unreachable. This may provide clues about what's causing the failure.

  2. Monitor resource utilization: Use Amazon CloudWatch to monitor CPU, memory, and disk usage. If resources are being exhausted by the end of the day, this could explain why the instances become unreachable overnight.

  3. Investigate scheduled tasks: Look for any scheduled tasks or cron jobs that run overnight which might be causing the instances to become unresponsive.

  4. Check for software updates: Ensure that all software, including the operating system, is up to date. Sometimes, outdated software can lead to instability.

  5. Examine network configuration: Verify that there are no network-related issues, such as security group changes or network ACL updates, that might be impacting the instances' reachability.

  6. Consider using an instance maintenance policy: If you're using Auto Scaling, you can set up an instance maintenance policy to automatically replace unhealthy instances as soon as they're detected, minimizing downtime.

  7. Adjust the health check grace period: If you're using Elastic Load Balancing, ensure that the health check grace period is long enough to allow instances to fully start up and initialize before being marked as unhealthy.

  8. Investigate application-specific issues: If the problem is related to your application, consider implementing more robust error handling and recovery mechanisms.

Remember, it's crucial to identify the root cause of the issue rather than relying on daily restarts as a long-term solution. If the problem persists after thorough investigation and troubleshooting, consider reaching out to AWS Support for further assistance.

Sources
Troubleshoot status check failures on an unreachable EC2 Linux instance | AWS re:Post
View the reason for health check failures - Amazon EC2 Auto Scaling
Troubleshoot unhealthy instances in Amazon EC2 Auto Scaling - Amazon EC2 Auto Scaling
View health check status and the reason for health check failures - Amazon EC2 Auto Scaling

answered 10 months ago
EXPERT
reviewed 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.