- Newest
- Most votes
- Most comments
The issue you're experiencing with your c7a.medium Ubuntu LTS EC2 instances losing network connectivity after CPU or memory spikes is a complex problem that could have multiple causes. Based on the information provided, here are some potential explanations and troubleshooting steps:
-
Resource Exhaustion: The CPU or memory spikes might be causing resource exhaustion, leading to network connectivity issues. This could be particularly problematic if critical system processes are being starved of resources.
-
DHCP Issues: The reports of systemd-networkd failing to renew DHCP leases suggest that there might be a problem with the DHCP client on your instances. This could be exacerbated by high system load.
-
Kernel or Driver Issues: There might be underlying kernel or network driver issues that manifest under high load conditions.
-
snapd and Unattended Upgrades: The correlation with snapd refreshes and unattended upgrades suggests that these processes might be triggering the issue, possibly due to resource contention or conflicts with network-related services.
To troubleshoot and potentially resolve this issue:
-
Monitor Resource Usage: Use tools like CloudWatch to closely monitor CPU, memory, and network usage. Set up alarms to alert you when these metrics spike.
-
Check System Logs: Examine logs such as /var/log/syslog, /var/log/dmesg, and journalctl output for any error messages or warnings that occur just before or during the network outages.
-
Update the System: Ensure that your Ubuntu LTS version and all packages are up to date. There might be known issues that have been resolved in newer versions.
-
Adjust Network Settings: Consider tweaking network-related settings, such as increasing the DHCP lease renewal time or adjusting network buffer sizes.
-
Modify snapd and Unattended Upgrades: You might want to adjust the scheduling of snapd refreshes and unattended upgrades to times of lower system load, or temporarily disable them to see if it resolves the issue.
-
Instance Type Consideration: If the problem persists, you might want to consider testing with a different instance type to see if the issue is specific to c7a.medium instances.
-
AWS Support: If the problem continues after trying these steps, it may be worth engaging AWS support, especially if you suspect it might be related to the underlying EC2 infrastructure.
Remember to document all your troubleshooting steps and their outcomes. This will be valuable information if you need to escalate the issue to AWS support or if you encounter similar problems in the future.
Sources
My E2 instance network connection and stability | AWS re:Post
CPU spiked and eth0: Failed | AWS re:Post
EC2 Instance Reachability Check Failure | AWS re:Post
Relevant content
- asked 6 months ago
- asked 2 years ago
- asked 2 months ago