How do I troubleshoot Systems Manager Agent that's stuck in the starting state or fails to start?
I want to troubleshoot why AWS Systems Manager Agent (SSM Agent) is stuck in the starting state or failing to start.
Short description
A managed instance is an Amazon EC2 instance that's configured for use with Systems Manager. Managed instances can use Systems Manager services such as Run Command, Patch Manager, and Session Manager.
Make sure that your Amazon Elastic Compute Cloud (Amazon EC2) instance meets the following prerequisites to be a managed instance:
- The instance has SSM Agent installed and running.
- The instance has connectivity to the instance metadata service.
- The instance has connectivity with Systems Manager endpoints using the SSM Agent.
- The instance has the correct AWS Identity and Access Management (IAM) role attached to it.
SSM Agent doesn't start when these prerequisites aren't met.
For SSM Agent version 3.1.501.0 and later, you can use ssm-cli tool to determine whether an instance meets these requirements. With this tool, you can diagnose why an EC2 instance that's running isn't included in the list of managed instances in Systems Manager.
If your instance doesn't appear as a managed instance in the Systems Manager console, check the SSM Agent logs to troubleshoot further.
- You can find the SSM Agent logs for Linux at /var/log/amazon/ssm.
- You can find the SSM Agent logs for Windows at %PROGRAMDATA%\Amazon\SSM\Logs.
Note: If the instance isn't reporting to Systems Manager, then try logging in using RDP (Windows) or SSH (Linux) to collect the logs. If you still can't log in, then stop the instance and detach the root volume. Then, attach the root volume to another instance in the same Availability Zone as a secondary volume to get the logs.
Resolution
Note: If you receive errors when running AWS Command Line Interface (AWS CLI) commands, make sure that you're using the most recent AWS CLI version.
Make sure that you installed the latest version of SSM Agent
It's a best practice to use the latest version of SSM Agent.
For Linux, see Manually install SSM Agent on EC2 instances for Linux.
For Windows, see Manually install SSM Agent on EC2 instances for Windows Server.
Check connectivity to the instance metadata service
Note: This connectivity is required only for an EC2 instance and not for hybrid activation.
SSM Agent relies on EC2 instance metadata to function correctly. SSM Agent can access instance metadata using Instance Metadata Service Version 1 (IMDSv1) or Instance Metadata Service Version 2 (IMDSv2). Make sure that your instance can access IPv4 address of the Instance Metadata Service: 169.254.169.254.
To verify connectivity to Instance Metadata Service, run the following command from your EC2 instance:
Linux:
telnet 169.254.169.254 80 or curl -I http://169.254.169.254/latest/meta-data/
Windows:
curl http://169.254.169.254/latest/meta-data/ or Test-NetConnection 169.254.169.254 -port 80
If your instance can't access metadata, then make sure that metadata is turned on.
For existing EC2 instances, do the following to check if metadata is turned on:
- Open the Amazon EC2 console.
- In the navigation pane, choose Instances.
- Select your instance.
- Choose Actions, Instance settings, Modify instance metadata options.
- In the Modify instance metadata options dialog box, check whether Instance metadata service is enabled.
Or, use the describe-instances command to verify if Instance Metadata Service is turned on:
aws ec2 describe-instances --query "Reservations[*].Instances[*].MetadataOptions" --instance-ids i-012345678910
The output looks like the following:
[ [ { "State": "applied", "HttpTokens": "optional", "HttpPutResponseHopLimit": 1, "HttpEndpoint": "enabled", "HttpProtocolIpv6": "disabled", "InstanceMetadataTags": "disabled" } ] ]
The field HttpEndpoint in the preceding output indicates whether metadata is turned on.
If metadata access is turned off, turn it on.
If a proxy is configured in the instance, then make sure that the instance bypasses metadata IP (169.254.169.254). For more information, see the following user guides:
Linux: Configuring SSM Agent to use a proxy (Linux)
Windows: Configure SSM Agent to use a proxy for Windows Server instances
For Windows, check the specific route to metadata (169.254.169.254).
In PowerShell, run the route print and ipconfig /all commands. Then, check the metadata output:
Network Address Netmask Gateway Address 169.254.169.254 255.255.255.255 <Subnet Router Address>
Confirm that the Gateway Address field in the output matches the default gateway for the instance's primary network interface.
If the route isn't present or the Gateway Address field doesn't match, then do the following:
- Confirm that the latest version of EC2Config (Windows Server 2012R2 and earlier) or EC2Launch (Windows Server 2016 or later) is installed on the instance.
- To apply the route to the instance, restart the EC2Config service.
- If the routes are correct, but the instance is still unable to retrieve metadata, then review your instance's Windows Firewall, third-party firewall, and antivirus configuration. Confirm that traffic to 169.254.169.254 isn't explicitly denied.
To manually reset the metadata routes, do the following:
Note: These configured changes populate immediately. You don't need to restart the instance for the changes to take effect.
-
Run the following commands to remove the existing metadata routes from the route table:
route delete 169.254.169.123 route delete 169.254.169.249 route delete 169.254.169.250 route delete 169.254.169.251 route delete 169.254.169.252 route delete 169.254.169.253 route delete 169.254.169.254
-
Run the following command:
ipconfig /all
-
Note the Default Gateway IP that's returned from the command in Step 2.
-
Run the following commands. Replace DefaultGatewayIP with the IP address that you retrieved in Step 3.
route -p add 169.254.169.123 MASK 255.255.255.255 DefaultGatewayIP route -p add 169.254.169.249 MASK 255.255.255.255 DefaultGatewayIP route -p add 169.254.169.250 MASK 255.255.255.255 DefaultGatewayIP route -p add 169.254.169.251 MASK 255.255.255.255 DefaultGatewayIP route -p add 169.254.169.252 MASK 255.255.255.255 DefaultGatewayIP route -p add 169.254.169.253 MASK 255.255.255.255 DefaultGatewayIP route -p add 169.254.169.254 MASK 255.255.255.255 DefaultGatewayIP
-
Restart SSM Agent.
Check connectivity with Systems Manager endpoints
The best method to verify this connectivity depends on your operating system. For a list of Systems Manager endpoints by Region, see AWS Systems Manager endpoints and quotas.
Note: In the following examples, the ssmmessages endpoint is required only for AWS Systems Manager Session Manager.
For EC2 Linux instances, run either telnet or netcat commands to verify connectivity to endpoints on port 443.
Telnet
telnet ssm.RegionID.amazonaws.com 443 telnet ec2messages.RegionID.amazonaws.com 443 telnet ssmmessages.RegionID.amazonaws.com 443
Be sure to replace RegionID with your AWS Region ID.
If the connection is successful, you get an output that's similar to the following:
root@111800186:~# telnet ssm.us-east-1.amazonaws.com 443 Trying 52.46.141.158... Connected to ssm.us-east-1.amazonaws.com. Escape character is '^]'. To exit from telnet, hold down the Ctrl key and press the ] key. Enter quit, and then press Enter.
Netcat
nc -vz ssm.RegionID.amazonaws.com 443 nc -vz ec2messages.RegionID.amazonaws.com 443 nc -vz ssmmessages.RegionID.amazonaws.com 443
Note: Netcat isn't preinstalled on Amazon EC2 instances. To manually install Netcat, see Ncat on the Nmap website.
For EC2 Windows instances, run the following Windows PowerShell commands to verify connectivity to endpoints on port 443:
Test-NetConnection ssm.RegionID.amazonaws.com -port 443 Test-NetConnection ec2messages.RegionID.amazonaws.com -port 443 Test-NetConnection ssmmessages.RegionID.amazonaws.com -port 443
If the connection is successful, you get an output that's similar to the following:
PS C:\Users\ec2-user> Test-NetConnection ssm.us-east-1.amazonaws.com -port 443 ComputerName : ssm.us-east-1.amazonaws.com RemoteAddress : 52.46.145.233 RemotePort : 443 InterfaceAlias : Ethernet SourceAddress : 10.35.218.225 TcpTestSucceeded : True
Check the IAM role for SSM Agent
SSM Agent requires certain IAM permissions to make the Systems Manager API calls. You can manage these permissions using one of the following approaches:
- Default Host Management Configuration allows Systems Manager to manage your Amazon EC2 instances automatically. It allows instance management without the use of instance profiles. This configuration makes sure that Systems Manager has permissions to manage all instances in the Region and account.
- You can grant access at the instance level using an IAM instance profile. An instance profile is a container that passes IAM role information to an instance at launch. For more information, see Alternative configuration.
Related information
Make an Amazon EBS volume available for use on Linux
Make an Amazon EBS volume available for use on Windows
Why does my Amazon EC2 Windows instance generate a "Waiting for the metadata service" error?
Relevant content
- Accepted Answerasked 3 years agolg...
- asked 5 years agolg...
- asked 8 months agolg...
- asked 6 years agolg...
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 11 days ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 18 days ago