- Newest
- Most votes
- Most comments
Analysis of Logs
Memory Allocation Failure:
The mmap() failed: [12] Cannot allocate memory error in the HTTPD logs indicates that the server attempted to allocate memory but was unable to do so. However, this alone doesn't typically cause a system reboot; it might cause the HTTPD service to crash or restart.
Kernel Logs:
NMI watchdog: Perf event create on CPU 0 failed with -2: This error might indicate a problem with the performance monitoring system on the CPU. The NMI (Non-Maskable Interrupt) watchdog can trigger a system reboot if it detects that the system is hung or unresponsive.
acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM: The ACPI (Advanced Configuration and Power Interface) errors suggest that the kernel encountered issues with power management and PCI configuration. This could indicate a deeper issue with the hardware or a problem in the ACPI configuration within the kernel.
SSM Agent Errors:
The errors from amazon-ssm-agent about failing to load instance info and health pings failing might indicate that the instance was rebooting and the agent was unable to communicate with the AWS back-end services. However, these errors are likely a symptom of the reboot rather than the cause.
OOM (Out of Memory) Handling:
The panic_on_oom kernel parameter is set to 1, meaning that the system will panic and reboot if an out-of-memory (OOM) condition occurs. Even though there were no explicit OOM messages in the logs, the HTTPD memory allocation failure could suggest a low-memory situation that triggered a kernel panic. Determining the Cause
To determine the root cause, you can take the following steps:
Check CloudWatch Logs and Metrics:
Review the CloudWatch logs for any System Reboot events and detailed EC2 instance status checks. Look for any System Status Check or Instance Status Check failures around the time of the reboots.
Inspect System Logs (dmesg):
Check the dmesg logs for any messages leading up to the reboot. Look for any kernel panics, OOM killer activity, or hardware-related errors.
AWS EC2 Monitoring:
Review the AWS EC2 monitoring metrics for the instance. Look for any unusual CPU, memory, or I/O activity just before the reboot. Potential Causes and Solutions
Kernel or Hardware Issue:
The ACPI errors and NMI watchdog failures point towards potential hardware issues or kernel bugs. You may want to update the kernel to the latest stable version available for your OS. If the problem persists, consider launching a new instance on different hardware.
Out of Memory (OOM):
If the instance ran out of memory, the kernel's panic_on_oom setting would cause a reboot. Consider increasing the instance size or optimizing your application's memory usage. You could also disable the panic_on_oom parameter if frequent OOM situations are not expected.
AWS Infrastructure Maintenance:
Sometimes, AWS may automatically reboot instances for maintenance purposes. Check the Event History in the EC2 console for any scheduled maintenance activities around the time of the reboot. Preventive Measures
Instance Size Adjustment:
If memory allocation failures are common, consider upgrading to an instance with more memory, such as a t3.micro or t3.small instance.
Monitoring and Alerts:
Set up CloudWatch alarms for memory, CPU, and disk utilization. This can alert you before a resource bottleneck leads to a reboot.
Kernel Update:
Ensure that your instance's kernel and all related packages are up-to-date to prevent kernel-level issues.
Application Optimization:
Optimize your HTTPD server configuration to handle memory more efficiently. Consider tuning the number of worker threads/processes and reducing memory usage per worker.
Relevant content
- asked a year ago
- asked 4 years ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 3 months ago