Get Hands-on with Amazon EKS - Workshop Event Series
Whether you're taking your first steps with Kubernetes or you're an experienced practitioner looking to sharpen your skills, our Amazon EKS workshop series delivers practical, real-world experience that moves you forward. Learn directly from AWS solutions architects and EKS specialists through hands-on sessions designed to build your confidence with Kubernetes. Register now and start building with Amazon EKS!
EC2 Instance Stops Shortly After Boot Due to auditd Disk-Full Halt Behavior
Amazon EC2 instances that stop cleanly a few minutes after startup are often misinterpreted as experiencing infrastructure instability or hypervisor issues. In many cases, however, this behavior is caused by guest operating system policy enforcement, not AWS. This article explains how the Linux auditd subsystem can intentionally halt an EC2 instance when disk or log space thresholds are reached, why this shutdown may not generate EC2 status check failures, what console logs may or may not show,
Overview of the Problem
A common symptom pattern looks like this:
- The EC2 instance launches successfully
- The operating system boots normally
- Filesystems mount and services begin starting
- The instance stops cleanly within 1–3 minutes
- Restarting the instance reproduces the behavior
From the AWS perspective, the instance simply transitions to a stopped state without errors.
This behavior is not caused by EC2, EBS, or the underlying host. Instead, the operating system is deliberately powering itself off in response to a configured policy—most commonly within auditd.
Why No EC2 System or Instance Status Check Failures Occur
Important: An OS-initiated, orderly shutdown (such as one triggered by auditd policy enforcement) may not generate any EC2 System Status Check or Instance Status Check failures. From the AWS control plane, this type of shutdown is indistinguishable from a user-initiated poweroff, even though it is the result of automated guest OS behavior.
This scenario often causes confusion because no EC2 status check failures are reported.
This is expected behavior:
- System Status Checks validate AWS-managed infrastructure (power, host hardware, networking)
- Instance Status Checks validate the hypervisor's ability to communicate with the guest OS
In an auditd-triggered shutdown:
- The OS remains healthy
- The kernel does not panic
- The shutdown is orderly and intentional
As a result, the instance passes both status checks right up until it powers itself off. AWS correctly reports no impairment because the infrastructure is functioning as designed.
How auditd Can Halt the Operating System
The Linux audit subsystem is designed to guarantee audit log integrity. Its behavior is controlled by configuration in:
/etc/audit/auditd.conf
Key parameters include:
space_leftspace_left_actionadmin_space_leftadmin_space_left_actiondisk_full_actiondisk_error_action
In hardened or compliance-focused environments, these actions are sometimes configured as:
disk_full_action = HALT
admin_space_left_action = HALT
When auditd determines that:
- the audit log filesystem is full, or
- free space falls below a defined threshold,
it may intentionally halt the operating system to preserve audit integrity.
This is a clean shutdown initiated by userspace, not a crash.
Why This Appears to Be an EC2 Problem
From the AWS control plane:
- The instance stops without warning
- No infrastructure errors are reported
- Restarting the instance repeats the behavior
Because EC2 reflects the guest OS state, an OS-initiated poweroff appears identical to a user-initiated shutdown from the AWS control plane. This can make it difficult to distinguish infrastructure events from guest operating system behavior without examining OS-level logs.
What Instance Console Logs May (and May Not) Show
What Console Logs Can Show
Instance console output often captures the shutdown sequence, including:
- systemd stopping services
- Filesystems being unmounted
- The system reaching
poweroff.target
Messages such as:
Reached target System Shutdown
Shutting down.
These messages confirm that a clean shutdown occurred.
What Console Logs Often Do Not Show
Console logs typically do not capture the root trigger, such as:
- auditd policy decisions
- disk-space threshold evaluations
- watchdog or compliance logic
Those details are usually recorded in:
- the system journal (journald)
- audit logs
- application or HA-specific logs
Important note: In some cases—especially when shutdown happens quickly—console output may be minimal or empty. This is normal behavior and does not rule out an OS-initiated shutdown.
Interpreting "A stop job is running for …" Messages
During shutdown, systemd stops services in dependency order. This commonly produces messages such as:
A stop job is running for CrowdStrike Falcon Sensor
A stop job is running for Oracle High Availability Services
A stop job is running for LSB: Start and Stop Oracle High Availability Service
These messages are a consequence of the shutdown, not the cause.
They indicate that:
- The system is already shutting down
- systemd is waiting for services to exit cleanly
They do not indicate that the listed service initiated or caused the shutdown.
How to Confirm auditd as the Shutdown Trigger
Check the Previous Boot's Journal
journalctl -b -1 | grep -Ei 'audit|shutdown|poweroff|halt'
Inspect auditd Configuration
grep -E 'space_left|admin_space_left|disk_full' /etc/audit/auditd.conf
Check Disk and Inode Usage
df -h df -ih
Audit logs are typically stored under:
/var/log/audit
If this path shares a filesystem with /, audit logs can exhaust root disk space.
Why Deleting Audit Logs Appears to Fix the Issue
Removing audit logs reduces disk usage and prevents auditd from triggering its halt action, allowing the instance to remain running.
However, this only masks the symptom and may:
- Remove forensic evidence
- Violate compliance requirements
- Allow the issue to recur
Proper configuration is required for a durable fix.
Recommended auditd Configuration for EC2 Environments
Isolate Audit Logs
- Place
/var/log/auditon a dedicated filesystem or logical volume - Size it for expected log volume and retention
Use Non-Halting Actions with Alerting
Example baseline (adjust for your requirements):
max_log_file = 100
num_logs = 20
max_log_file_action = ROTATE
space_left = 25%
space_left_action = SYSLOG
admin_space_left = 10%
admin_space_left_action = SYSLOG
disk_full_action = ROTATE
disk_error_action = SYSLOG
This preserves auditing while preventing unplanned host shutdowns.
Enable Monitoring
- Disk and inode utilization alarms
- auditd service health
- Unexpected instance stops
Key Takeaways
- A clean EC2 shutdown shortly after boot is often initiated by the guest OS rather than an underlying AWS infrastructure issue
- auditd can intentionally halt a Linux system when disk thresholds are reached
- This behavior may occur without any EC2 status check failures
- Console logs often show the shutdown sequence or symptoms, rather than the condition that triggered the shutdown
- Proper auditd configuration helps prevent availability impact while maintaining compliance requirements
Conclusion
When running hardened or compliance-driven Linux workloads on Amazon EC2, auditd configuration must be aligned with disk layout and operational monitoring. Misconfigured audit halt actions can cause repeated, silent instance shutdowns that appear infrastructure-related but are entirely controlled by the guest OS. Correctly tuning auditd policies ensures both audit integrity and system availability.
- Language
- English
Relevant content
- asked 10 months ago
- Accepted Answerasked 2 years ago
