How do I troubleshoot a State Manager association that's stuck in "Failed" or "Pending" status?

4 minuti di lettura
0

I want to troubleshoot a State Manager association that’s stuck in "Failed" or "Pending" status.

Short description

State Manager, a capability of AWS Systems Manager, is a secure and scalable configuration management service. State Manager automates the process of keeping your managed nodes and other AWS resources in a state that you define.

AWS Systems Manager State Manager association is a configuration that's assigned to your managed instances. The configuration defines the state that you want to maintain on your instances.

When creating a State Manager association, Systems Manager binds the schedule, targets, document, and parameter information that you specify to the managed instances. When the system is reaching all targets and immediately applying the state that's specified in the association, the association status is Pending.

Prerequisites

Roles and permissions for Systems Manager

To allow users to create an association, you must attach the AWS managed AmazonSSMFullAccess policy to the user.

For Run Command, a capability of AWS Systems Manager, State Manager requires that the target instance is a managed instance. State Manager requires an AWS Identity and Access Management (IAM) role with permissions to retrieve and run Systems Manager documents. You can find the minimum required permissions for this role in the managed AmazonSSMManagedInstanceCore role policy.

Automation Association

If State Manager is targeting the automation document, then permissions are also required for running the automation. For more information, see Method 2: Use IAM to configure roles for Automation.

Connectivity and agent configuration

Verify that the following resources and settings are configured:

  • AWS Systems Manager Agent (SSM Agent) is installed on the instance to use Run Commands.
  • Metadata is accessible on all target instances, excluding on-premises managed instances.
  • The target instance has outbound internet access using TCP 443 to SSM Regional service endpoints, ec2messages.region-id.amazonaws.com and ssm.region-id.amazonaws.com.

Troubleshoot an association that's stuck in Pending or Failed status

If the association remains Pending or Failed, then first check the GitHub website to confirm that you installed the latest version of SSM Agent. Then, check the status of the resource where the association is applied and view the history to confirm if there were any invocations.

To check the status, complete the following steps:

  1. Open the Systems Manager console.
  2. In the navigation pane, choose State Manager.
  3. Choose the Association Id for the association that's stuck in the Pending or Failed state.
  4. Choose the Execution history tab to view the invocation history. If the history lists invocations, then choose Execution id to see the resource type, status, and other details.
    Note: If there aren't any invocations listed in the history, then verify that the instance is a managed instance. From the Systems Manager console, the instance must be listed under Managed instances, and the SSM Agent ping status must be Online.
  5. Choose Resource ID, and then select the target instance Execution ID Association execution targets.
  6. Select the target instance Resource id, and then choose Output.

The output displays details and an error message about why the association failed. For more information on error messages, see the following:

If your instance doesn't appear under Managed instances, or if the SSM Agent ping status is Connection lost, then additional troubleshooting is required. To troubleshoot these issues, see Why is my EC2 instance not displaying as a managed node or showing a "Connection lost" status in Systems Manager?

Note: The output differs depending on the Systems Manager document that you use. For more information, see AWS Systems Manager documents.

Review SSM Agent logs

Review the SSM Agent logs for more details about the Run Command document failure:

For Linux and macOS, locate the logs in the following directories:

  • /var/log/amazon/ssm/amazon-ssm-agent.log
  • /var/log/amazon/ssm/errors.log
  • /var/log/amazon/ssm/audits/amazon-ssm-agent-audit-YYYY-MM-DD

Note: SSM Agent stderr and stdout files write to the /var/lib/amazon/ssm directory.

For Windows, locate the logs in the following directories:

  • %PROGRAMDATA%\Amazon\SSM\Logs\amazon-ssm-agent.log
  • %PROGRAMDATA%\Amazon\SSM\Logs\errors.log
  • %PROGRAMDATA%\Amazon\SSM\Logs\audits\amazon-ssm-agent-audit-YYYY-MM-DD

Related information

Understanding automation statuses

AWS UFFICIALE
AWS UFFICIALEAggiornata 10 mesi fa