- Newest
- Most votes
- Most comments
Hello Matt,
The issue seems to be a temporary loss of connectivity between the EC2 instance and the AWS Systems Manager service, which prevents the Session Manager from establishing a connection.
To resolve it:
- Check the SSM Agent logs (/var/log/amazon/ssm/amazon-ssm-agent.log) for any error messages or clues.
- Ensure the SSM Agent is running on the instance (sudo status amazon-ssm-agent). If not, restart it (sudo restart amazon-ssm-agent).
- Verify that the instance has internet connectivity and can reach the Systems Manager endpoint (ssm.aws_region.amazonaws.com on port 443). Check security group rules, NAT gateways, and VPC endpoint configurations.
- If the issue persists, try restarting the EC2 instance or updating the SSM Agent to the latest version.
- If the problem continues, open a case with AWS Support for further investigation and assistance.
If none of these solutions work, you might need to open a case with AWS Support for further assistance, as they can investigate the issue more thoroughly based on your specific environment and configuration.
Hello.
To resolve this issue of EC2 instances periodically disconnecting from AWS Systems Manager (SSM), ensure that the SSM Agent is up-to-date and running on all instances. The instance must have the correct IAM role attached with the necessary permissions (EX. AmazonSSMManagedInstanceCore). If the instances are in a private VPC without internet access, set up VPC endpoints for SSM to allow communication. Regularly check and restart the SSM Agent if needed and monitor the instance's network connectivity and resource usage. This will help maintain continuous connection with SSM.
https://repost.aws/knowledge-center/ssm-session-manager-failures
Thanks - it's random as it will be connected for weeks and then just drop out but other servers on same setup will be fine so it feels very random but will check that all again
The issue you're experiencing with instances periodically disconnecting from Session Manager is indeed a common problem, and there are several potential causes and solutions to consider:
-
SSM Agent version: Ensure that you're running the latest version of the SSM Agent on your instances. AWS regularly updates the agent to fix bugs and improve stability. You should have at least version 2.3.68.0 or later installed.
-
Network connectivity: While you've mentioned that routing rules seem fine, it's worth double-checking your VPC configuration, including security groups and network ACLs, to ensure there are no intermittent issues blocking outbound traffic to the Systems Manager endpoints.
-
IAM permissions: Verify that the IAM role attached to your EC2 instances has the necessary permissions for Session Manager. Sometimes, role permissions can be inadvertently modified, causing periodic disconnections.
-
VPC endpoints: If your instances are in a VPC without direct internet access, consider setting up VPC endpoints for Systems Manager using AWS PrivateLink. This can improve the reliability of the connection between your instances and the Systems Manager service.
-
Operating system limits: Some operating systems have limits on the number of file descriptors or open files. Increasing these limits might help if the SSM Agent is hitting these boundaries.
-
CloudWatch monitoring: Set up CloudWatch alarms to monitor the status of your SSM Agent and get notified when instances become disconnected. This can help you respond more quickly to issues.
-
Automation runbook: Consider using the AWSSupport-TroubleshootSessionManager automation runbook. This runbook can help diagnose issues by checking the instance status, SSM Agent version, IAM policy attachments, and connectivity to various endpoints.
-
Regular restarts: If the issue persists, you might want to consider implementing a scheduled task to restart the SSM Agent periodically, which can sometimes resolve connectivity issues.
-
AWS Support: If none of these solutions work, it may be worth contacting AWS Support, as there could be underlying issues with the AWS infrastructure or services that are causing the disconnections.
Remember to test any changes in a non-production environment first, and always follow AWS best practices for managing your EC2 instances and SSM Agent.
Sources
Troubleshooting Session Manager - AWS Systems Manager
Verify the Session Manager plugin installation - AWS Systems Manager
AWS Systems Manager Session Manager - AWS Systems Manager
Step 1: Complete Session Manager prerequisites - AWS Systems Manager
Support Automation Workflow (SAW) Runbook: Troubleshoot AWS Systems Manager Session Manager | AWS re:Post
Relevant content
- asked 2 years ago

Thanks - it's random as it will be connected for weeks and then just drop out but other servers on same setup will be fine so it feels very random but will check that all again
Intermittent SSM Agent disconnections. Check logs, network, resources. Compare instances. Restart agent periodically.
The intermittent and random nature of the issue does make it more challenging to troubleshoot.
Since it affects some instances but not others on the same setup, it could point to an instance-specific issue rather than a broader networking or configuration problem.
Check if there are any patterns - does it happen more frequently on instances of a certain type, AMI, or launch configuration? This may provide clues.
Review any recent changes made to the affected instances, like software updates, configuration changes, etc. That could have introduced the issue.
Monitor resource utilization (CPU, memory, disk) when the issue occurs to see if resource constraints are causing the SSM Agent to disconnect.
As a workaround, you could set up CloudWatch event rules to automatically restart the SSM Agent service periodically to re-establish connectivity.
Enable and review VPC Flow Logs for deeper network traffic analysis when the disconnections happen.
The randomness does make it tricky, but paying close attention to any patterns, recent changes, and resource usage on the affected instances may reveal the root cause. Don't hesitate to engage AWS Support if the issue persists despite your troubleshooting efforts.