How do I troubleshoot Systems Manager Agent that's stuck in the starting state or fails to start?

7 minute read
0

I want to troubleshoot why AWS Systems Manager Agent (SSM Agent) is stuck in the starting state or failing to start.

Short description

A managed instance is an Amazon EC2 instance that's configured for use with Systems Manager. Managed instances can use Systems Manager services such as Run Command, Patch Manager, and Session Manager.

Make sure that your Amazon Elastic Compute Cloud (Amazon EC2) instance meets the following prerequisites to be a managed instance:

  • The instance has SSM Agent installed and running.
  • The instance has connectivity to the instance metadata service.
  • The instance has connectivity with Systems Manager endpoints using the SSM Agent.
  • The instance has the correct AWS Identity and Access Management (IAM) role attached to it.

SSM Agent doesn't start when these prerequisites aren't met.

For SSM Agent version 3.1.501.0 and later, you can use ssm-cli tool to determine whether an instance meets these requirements. With this tool, you can diagnose why an EC2 instance that's running isn't included in the list of managed instances in Systems Manager.

If your instance doesn't appear as a managed instance in the Systems Manager console, check the SSM Agent logs to troubleshoot further.

  • You can find the SSM Agent logs for Linux at /var/log/amazon/ssm.
  • You can find the SSM Agent logs for Windows at %PROGRAMDATA%\Amazon\SSM\Logs.

Note: If the instance isn't reporting to Systems Manager, then try logging in using RDP (Windows) or SSH (Linux) to collect the logs. If you still can't log in, then stop the instance and detach the root volume. Then, attach the root volume to another instance in the same Availability Zone as a secondary volume to get the logs.

Resolution

Note: If you receive errors when running AWS Command Line Interface (AWS CLI) commands, make sure that you're using the most recent AWS CLI version.

Make sure that you installed the latest version of SSM Agent

It's a best practice to use the latest version of SSM Agent.

For Linux, see Manually install SSM Agent on EC2 instances for Linux.

For Windows, see Manually install SSM Agent on EC2 instances for Windows Server.

Check connectivity to the instance metadata service

Note: This connectivity is required only for an EC2 instance and not for hybrid activation.

SSM Agent relies on EC2 instance metadata to function correctly. SSM Agent can access instance metadata using Instance Metadata Service Version 1 (IMDSv1) or Instance Metadata Service Version 2 (IMDSv2). Make sure that your instance can access IPv4 address of the Instance Metadata Service: 169.254.169.254.

To verify connectivity to Instance Metadata Service, run the following command from your EC2 instance:

Linux:

telnet 169.254.169.254 80
or
curl -I http://169.254.169.254/latest/meta-data/

Windows:

curl http://169.254.169.254/latest/meta-data/  
or
Test-NetConnection 169.254.169.254  -port 80

If your instance can't access metadata, then make sure that metadata is turned on.

For existing EC2 instances, do the following to check if metadata is turned on:

  1. Open the Amazon EC2 console.
  2. In the navigation pane, choose Instances.
  3. Select your instance.
  4. Choose Actions, Instance settings, Modify instance metadata options.
  5. In the Modify instance metadata options dialog box, check whether Instance metadata service is enabled.

Or, use the describe-instances command to verify if Instance Metadata Service is turned on:

aws ec2 describe-instances --query "Reservations[*].Instances[*].MetadataOptions" --instance-ids i-012345678910

The output looks like the following:

[
  [
    {
      "State": "applied",
      "HttpTokens": "optional",
      "HttpPutResponseHopLimit": 1,
      "HttpEndpoint": "enabled",
      "HttpProtocolIpv6": "disabled",
      "InstanceMetadataTags": "disabled"
    }
  ]
]

The field HttpEndpoint in the preceding output indicates whether metadata is turned on.

If metadata access is turned off, turn it on.

If a proxy is configured in the instance, then make sure that the instance bypasses metadata IP (169.254.169.254). For more information, see the following user guides:

Linux: Configuring SSM Agent to use a proxy (Linux)

Windows: Configure SSM Agent to use a proxy for Windows Server instances

For Windows, check the specific route to metadata (169.254.169.254).

In PowerShell, run the route print and ipconfig /all commands. Then, check the metadata output:

    Network Address        Netmask             Gateway Address
    169.254.169.254        255.255.255.255     <Subnet Router Address>

Confirm that the Gateway Address field in the output matches the default gateway for the instance's primary network interface.

If the route isn't present or the Gateway Address field doesn't match, then do the following:

  1. Confirm that the latest version of EC2Config (Windows Server 2012R2 and earlier) or EC2Launch (Windows Server 2016 or later) is installed on the instance.
  2. To apply the route to the instance, restart the EC2Config service.
  3. If the routes are correct, but the instance is still unable to retrieve metadata, then review your instance's Windows Firewall, third-party firewall, and antivirus configuration. Confirm that traffic to 169.254.169.254 isn't explicitly denied.

To manually reset the metadata routes, do the following:

Note: These configured changes populate immediately. You don't need to restart the instance for the changes to take effect.

  1. Run the following commands to remove the existing metadata routes from the route table:

    route delete 169.254.169.123
    route delete 169.254.169.249
    route delete 169.254.169.250
    route delete 169.254.169.251
    route delete 169.254.169.252
    route delete 169.254.169.253
    route delete 169.254.169.254
  2. Run the following command:

    ipconfig /all
  3. Note the Default Gateway IP that's returned from the command in Step 2.

  4. Run the following commands. Replace DefaultGatewayIP with the IP address that you retrieved in Step 3.

    route -p add 169.254.169.123 MASK 255.255.255.255 DefaultGatewayIP
    route -p add 169.254.169.249 MASK 255.255.255.255 DefaultGatewayIP
    route -p add 169.254.169.250 MASK 255.255.255.255 DefaultGatewayIP
    route -p add 169.254.169.251 MASK 255.255.255.255 DefaultGatewayIP
    route -p add 169.254.169.252 MASK 255.255.255.255 DefaultGatewayIP
    route -p add 169.254.169.253 MASK 255.255.255.255 DefaultGatewayIP
    route -p add 169.254.169.254 MASK 255.255.255.255 DefaultGatewayIP
  5. Restart SSM Agent.

Check connectivity with Systems Manager endpoints

The best method to verify this connectivity depends on your operating system. For a list of Systems Manager endpoints by Region, see AWS Systems Manager endpoints and quotas.

Note: In the following examples, the ssmmessages endpoint is required only for AWS Systems Manager Session Manager.

For EC2 Linux instances, run either telnet or netcat commands to verify connectivity to endpoints on port 443.

Telnet

telnet ssm.RegionID.amazonaws.com 443
telnet ec2messages.RegionID.amazonaws.com 443
telnet ssmmessages.RegionID.amazonaws.com 443

Be sure to replace RegionID with your AWS Region ID.

If the connection is successful, you get an output that's similar to the following:

root@111800186:~# telnet ssm.us-east-1.amazonaws.com 443
Trying 52.46.141.158...
Connected to ssm.us-east-1.amazonaws.com.
Escape character is '^]'.
To exit from telnet, hold down the Ctrl key and press the ] key. Enter quit, and then press Enter.

Netcat

nc -vz ssm.RegionID.amazonaws.com 443
nc -vz ec2messages.RegionID.amazonaws.com 443
nc -vz ssmmessages.RegionID.amazonaws.com 443

Note: Netcat isn't preinstalled on Amazon EC2 instances. To manually install Netcat, see Ncat on the Nmap website.

For EC2 Windows instances, run the following Windows PowerShell commands to verify connectivity to endpoints on port 443:

Test-NetConnection ssm.RegionID.amazonaws.com -port 443
Test-NetConnection ec2messages.RegionID.amazonaws.com -port 443
Test-NetConnection ssmmessages.RegionID.amazonaws.com -port 443

If the connection is successful, you get an output that's similar to the following:

PS C:\Users\ec2-user> Test-NetConnection ssm.us-east-1.amazonaws.com -port 443
ComputerName     : ssm.us-east-1.amazonaws.com
RemoteAddress    : 52.46.145.233
RemotePort       : 443
InterfaceAlias   : Ethernet
SourceAddress    : 10.35.218.225
TcpTestSucceeded : True

Check the IAM role for SSM Agent

SSM Agent requires certain IAM permissions to make the Systems Manager API calls. You can manage these permissions using one of the following approaches:

  • Default Host Management Configuration allows Systems Manager to manage your Amazon EC2 instances automatically. It allows instance management without the use of instance profiles. This configuration makes sure that Systems Manager has permissions to manage all instances in the Region and account.
  • You can grant access at the instance level using an IAM instance profile. An instance profile is a container that passes IAM role information to an instance at launch. For more information, see Alternative configuration.

Related information

Why is my EC2 instance not displaying as a managed node or showing a "Connection lost" status in Systems Manager?

Make an Amazon EBS volume available for use on Linux

Make an Amazon EBS volume available for use on Windows

Why does my Amazon EC2 Windows instance generate a "Waiting for the metadata service" error?

AWS OFFICIAL
AWS OFFICIALUpdated a year ago