Skip to content

AWS SSM Agent Fails to Restart After Patch Manager Reboot on Amazon Linux 2022

0

Hello Team,

We have an Amazon Linux 2022 instance that is patched using AWS Patch Manager with a custom patch baseline.

The baseline is configured as follows:

  1. Patches are automatically approved and applied immediately (0 days delay).
  2. It includes patches classified as Security, Bugfix, Enhancement, or Recommended with severity: Important, Critical, or Medium.
  3. Compliance level is set to HIGH.
  4. Patching is managed via a maintenance window.

Issue:

  1. The maintenance window task times out with "Execution Timed Out".
  2. However, logs show that all patches were successfully installed.
  3. During the patching process, the system reboots as expected based on the patching policy.
  4. SSM Agent is terminated during the reboot and does not restart automatically.

Investigation So Far:

  1. The SSM Patch Manager logs confirm all patches were installed successfully.
  2. The SSM logs do not show any errors before the reboot.
  3. The SSM agent does not restart after reboot, making the instance unmanageable via SSM Session Manager.

Enter image description here Enter image description here Enter image description here Enter image description here

asked a year ago591 views
2 Answers
0

The issue you're experiencing with the SSM Agent failing to restart after a Patch Manager reboot on Amazon Linux 2022 is a known problem that can occur in certain situations. Here are some potential causes and solutions to address this issue:

  1. IAM Role Permissions: Ensure that the IAM role associated with your EC2 instance has the necessary permissions to interact with Systems Manager. The role should have the "AmazonSSMManagedInstanceCore" managed policy attached or equivalent permissions.

  2. Instance Metadata Service (IMDS) Access: Verify that the EC2 instance has access to the Instance Metadata Service. This is crucial for retrieving temporary credentials to authenticate with AWS services.

  3. Security Group and VPC Configuration: Check that your instance's security group and VPC allow HTTPS (port 443) outbound traffic to the required Systems Manager endpoints: ssm.region.amazonaws.com, ec2messages.region.amazonaws.com, and ssmmessages.region.amazonaws.com.

  4. VPC Endpoints: If your VPC doesn't have internet access, ensure you have the necessary VPC endpoints set up for Systems Manager.

  5. SSM Agent Configuration: Verify that the SSM Agent is properly configured and set to start automatically on boot. You may need to manually restart the SSM Agent after the reboot or configure it to start automatically.

  6. Network Connectivity: Ensure there are no network issues preventing the SSM Agent from communicating with the Systems Manager service after reboot.

  7. OS-level Changes: Check if any OS-level changes during patching might be affecting the SSM Agent's ability to start automatically.

To resolve this issue:

  1. Manually restart the SSM Agent after reboot if it doesn't start automatically.
  2. Configure the SSM Agent to start automatically on boot using systemd or the appropriate init system for Amazon Linux 2022.
  3. Review and update the IAM role permissions, security group rules, and network configurations as needed.
  4. Consider implementing a custom script or user data that checks and restarts the SSM Agent if it's not running after a reboot.

If the problem persists after trying these solutions, you may need to open a support ticket with AWS for further assistance, as there could be underlying issues specific to your environment or the Amazon Linux 2022 configuration.

Sources
statuscheckfailed_instance still unreachable after reboot and shutdown restart | AWS re:Post
Why my instance stop working? | AWS re:Post
SSM Agent is not online. The SSM Agent was unable to connect to a Systems Manager endpoint to register itself with the service. | AWS re:Post

answered a year ago
EXPERT
reviewed a year ago
0

Hello.

Is it possible to manually restart SSM Agent?
Do any errors occur when restarting?
Please enable debug mode and check the logs listed in the document below.
https://repost.aws/knowledge-center/ssm-agent-logs
https://docs.aws.amazon.com/systems-manager/latest/userguide/ssm-agent-logs.html

EXPERT
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.