- Newest
- Most votes
- Most comments
Based on your description, there are several potential causes for your G5 instance becoming unresponsive despite showing normal CPU and memory utilization in CloudWatch:
-
Network latency or connectivity issues: Even with low resource utilization, network problems can cause connection failures.
-
Windows-specific issues (if running Windows): The system might be running background processes like Windows Update, security software scans, custom startup scripts, or scheduled tasks that temporarily affect connectivity without significantly impacting overall CPU metrics.
-
Application-specific problems: Your application might have internal bottlenecks or deadlocks that don't necessarily show up as high resource utilization.
-
External service dependencies: If your application relies on external services, delays or failures in those services could impact your instance's responsiveness.
To troubleshoot this issue:
-
Check system and application logs for clues about what happens before the instance becomes unresponsive.
-
Monitor more detailed metrics by installing the CloudWatch agent to collect additional data, especially memory metrics which aren't included in the default EC2 monitoring.
-
Examine the console output for additional information when the issue occurs.
-
Consider setting up CloudWatch alarms to notify you when status checks fail, allowing for quicker response.
-
If the problem persists, you might want to implement an Auto Scaling group to automatically replace impaired instances or consider changing to a different instance type if the G5 is not optimal for your workload.
The fact that stopping and starting the instance resolves the issue suggests the problem might be related to the underlying host system, as this operation typically migrates your instance to a new host.
Sources
EC2 Server Late response issue | AWS re:Post
Troubleshoot issues connecting to your Amazon EC2 Windows instance - Amazon Elastic Compute Cloud
Common screenshots to troubleshoot unreachable Windows instances - Amazon Elastic Compute Cloud
EC2 Ubuntu instance stops working everyday | AWS re:Post
AWS EC2 status check failed | AWS re:Post
Hello.
Can you check the syslog or sshd logs for the time period when the connection failed?
You may be able to see some kind of error.
You mentioned you already checked the CPU and memory of the EC2. Regarding, I guess you have setup the custom metrics, otherwise it won't show up.
During the time you cannot login, are you logging in using SSM, or RDP/SSH. Can you try with SSM and see if that works. That could confirm your connectivity issue. Also, is the application working fine during the time you cannot login?
Have you checked the status checks of the EC2, it could tell if the EC2 is down.
Finally, can you check the system log and instance screenshot. These options are available via EC2 > Actions dropdown > Monitor and Troubleshoot.
Relevant content
- asked 4 years ago
