Skip to content

Instance is not connected to session manager

0

We keep having periodic issues where our servers go offline showing the message "Instance is not connected to session manager".

Checking the guide the solution says "Solution: SSM Agent is Amazon software that runs on Amazon EC2 instances so that Session Manager can connect to them. If you see this error, SSM Agent is unable to establish a connection with the Systems Manager endpoint. The source of the problem could be firewall restrictions, routing problems, or lack of internet connectivity. To resolve this issue, investigate network connectivity problems."

The servers are up and running and then just stop. They are not being blocked by an AWS firewall. The routing rules must be ok as the servers will be up for weeks before just stopping leaving lack of internet connectivity. I'm not sure how we'd solve that as that would be on the AWS side.

I've read it's a common problem but does anyone know of a solution to prevent working servers suddenly disconnecting from the session manager?

Thank you for your help!

asked a year ago7.3K views
3 Answers
0

Hello Matt,

The issue seems to be a temporary loss of connectivity between the EC2 instance and the AWS Systems Manager service, which prevents the Session Manager from establishing a connection.

To resolve it:

  • Check the SSM Agent logs (/var/log/amazon/ssm/amazon-ssm-agent.log) for any error messages or clues.
  • Ensure the SSM Agent is running on the instance (sudo status amazon-ssm-agent). If not, restart it (sudo restart amazon-ssm-agent).
  • Verify that the instance has internet connectivity and can reach the Systems Manager endpoint (ssm.aws_region.amazonaws.com on port 443). Check security group rules, NAT gateways, and VPC endpoint configurations.
  • If the issue persists, try restarting the EC2 instance or updating the SSM Agent to the latest version.
  • If the problem continues, open a case with AWS Support for further investigation and assistance.

If none of these solutions work, you might need to open a case with AWS Support for further assistance, as they can investigate the issue more thoroughly based on your specific environment and configuration.

EXPERT
answered a year ago
EXPERT
reviewed a year ago
  • Thanks - it's random as it will be connected for weeks and then just drop out but other servers on same setup will be fine so it feels very random but will check that all again

  • Intermittent SSM Agent disconnections. Check logs, network, resources. Compare instances. Restart agent periodically.

  • The intermittent and random nature of the issue does make it more challenging to troubleshoot.

    1. Since it affects some instances but not others on the same setup, it could point to an instance-specific issue rather than a broader networking or configuration problem.

    2. Check if there are any patterns - does it happen more frequently on instances of a certain type, AMI, or launch configuration? This may provide clues.

    3. Review any recent changes made to the affected instances, like software updates, configuration changes, etc. That could have introduced the issue.

    4. Monitor resource utilization (CPU, memory, disk) when the issue occurs to see if resource constraints are causing the SSM Agent to disconnect.

    5. As a workaround, you could set up CloudWatch event rules to automatically restart the SSM Agent service periodically to re-establish connectivity.

    6. Enable and review VPC Flow Logs for deeper network traffic analysis when the disconnections happen.

    The randomness does make it tricky, but paying close attention to any patterns, recent changes, and resource usage on the affected instances may reveal the root cause. Don't hesitate to engage AWS Support if the issue persists despite your troubleshooting efforts.

0

Hello.

To resolve this issue of EC2 instances periodically disconnecting from AWS Systems Manager (SSM), ensure that the SSM Agent is up-to-date and running on all instances. The instance must have the correct IAM role attached with the necessary permissions (EX. AmazonSSMManagedInstanceCore). If the instances are in a private VPC without internet access, set up VPC endpoints for SSM to allow communication. Regularly check and restart the SSM Agent if needed and monitor the instance's network connectivity and resource usage. This will help maintain continuous connection with SSM.

https://repost.aws/knowledge-center/ssm-session-manager-failures

EXPERT
answered a year ago
EXPERT
reviewed a year ago
  • Thanks - it's random as it will be connected for weeks and then just drop out but other servers on same setup will be fine so it feels very random but will check that all again

0

The issue you're experiencing with instances periodically disconnecting from Session Manager is indeed a common problem, and there are several potential causes and solutions to consider:

  1. SSM Agent version: Ensure that you're running the latest version of the SSM Agent on your instances. AWS regularly updates the agent to fix bugs and improve stability. You should have at least version 2.3.68.0 or later installed.

  2. Network connectivity: While you've mentioned that routing rules seem fine, it's worth double-checking your VPC configuration, including security groups and network ACLs, to ensure there are no intermittent issues blocking outbound traffic to the Systems Manager endpoints.

  3. IAM permissions: Verify that the IAM role attached to your EC2 instances has the necessary permissions for Session Manager. Sometimes, role permissions can be inadvertently modified, causing periodic disconnections.

  4. VPC endpoints: If your instances are in a VPC without direct internet access, consider setting up VPC endpoints for Systems Manager using AWS PrivateLink. This can improve the reliability of the connection between your instances and the Systems Manager service.

  5. Operating system limits: Some operating systems have limits on the number of file descriptors or open files. Increasing these limits might help if the SSM Agent is hitting these boundaries.

  6. CloudWatch monitoring: Set up CloudWatch alarms to monitor the status of your SSM Agent and get notified when instances become disconnected. This can help you respond more quickly to issues.

  7. Automation runbook: Consider using the AWSSupport-TroubleshootSessionManager automation runbook. This runbook can help diagnose issues by checking the instance status, SSM Agent version, IAM policy attachments, and connectivity to various endpoints.

  8. Regular restarts: If the issue persists, you might want to consider implementing a scheduled task to restart the SSM Agent periodically, which can sometimes resolve connectivity issues.

  9. AWS Support: If none of these solutions work, it may be worth contacting AWS Support, as there could be underlying issues with the AWS infrastructure or services that are causing the disconnections.

Remember to test any changes in a non-production environment first, and always follow AWS best practices for managing your EC2 instances and SSM Agent.
Sources
Troubleshooting Session Manager - AWS Systems Manager
Verify the Session Manager plugin installation - AWS Systems Manager
AWS Systems Manager Session Manager - AWS Systems Manager
Step 1: Complete Session Manager prerequisites - AWS Systems Manager
Support Automation Workflow (SAW) Runbook: Troubleshoot AWS Systems Manager Session Manager | AWS re:Post

answered a year ago
EXPERT
reviewed a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.