Can Systems Manager Target Only Instances That Are Running

0

I am using an Automation Runbook to install and configure CloudWatch Agent on managed instances. When I execute my Runbook on a tag-based Resource group, it attempts to run on instances that do not have "running" status. When it ran on stopped instances, it would eventually time out. How can I only target running instances in my Resource Group for execution? Things I have tried or looked into:

  • Adding a step to the start of my Runbook to assertAwsResourceProperty, DescribeInstanceStatus, and "running". This prevents the timeout, but returns nothing when run against a stopped instance, therefore Aborting and being marked as a failure. This is undesirable to me, because I don't see the Runbook as having failed, rather skipped execution for a legitimate reason. Furthermore, if run on a large batch of machines, this leads to even one stopped instance ending in the entire parent execution being marked as a failed.
  • Filtering my Resource Group to only contain running instances. I saw no way of doing this.
  • Adding a Property to my Runbook so that it can only be run on a running instance. Similar to how I use the TargetType property to limit it to /AWS::EC2::Instance. This made the most sense to me, because as the developer I know my Runbook can only be successful if the instance is running. I wanted something akin to being able to set my TargetType as service-provider::service-name::data-type-name::status. I haven't found any way of doing this.
  • Applying a filter when picking my Targets for execution in the Systems Manager console. This is another place where it made sense that I might find it, but I didn't . If, when I've chosen the Resource Group, the Interactive Instance picker were to update to display only the instances from that group, that would be unwieldy but at least allow me to manually deselect instances that are not running. None of this appears to be the case.

Is there a simple way to accomplish this that I have overlooked?

  • This sounds like a use case for State Manager instead of an Automation Runbook. Is there a particular reason you're using the latter instead of the former?

  • Applying a different CloudWatch Agent config (Configs stored in SSM Parameters) to each environment, using tags to set an instance's environment. My Runbook determines an instance's environment, checks for a corresponding SSM Parameter Path, then installs and configures the Agent. When I previously tried with an SM Association, applied to all instances, it didn't execute when an instance environment tag changed or was created. I settled on an EventBridge aws.tag rule to trigger. I manually execute the Runbook on affected Resource Groups, however, when a config is updated or newly created.

  • You can invoke an association synchronously and on demand by calling the StartAssociationsOnce action, also available from the CLI. Have you tried that already? You can tie that to an EventBridge event via a Lambda function as well.

  • Yes, but if your automation is idempotent -- as any well-written automation should be -- it shouldn't adversely impact any of your existing instances.

  • Yes, but if your automation is idempotent -- as any well-written automation should be -- it shouldn't adversely impact any of your existing instances.

    That's true, but my other goal here is to limit the number of executions. My current approach, for example managing X number of instances and using a 10 step automation, would execute 10 steps each time a tag changed. If run on all instances, it becomes 10*X steps each time a tag changes. It also, I'm afraid, doesn't solve my original problem as it appears State Manager Association also attempts to invoke the Runbook on stopped instances

asked 10 months ago149 views
2 Answers
1
Accepted Answer

Here's how I did it w/ an automation document, inspired by AWS-User-7841548's response. It'll target instances whose SSM agent is running. The filter is the equivalent of aws ssm describe-instance-information --filter "Key=InstanceIds,Values=xxx"

description: "changeme"
schemaVersion: '0.3'
assumeRole: '{{AutomationAssumeRole}}'
outputs:
  - runShellCommandLinux.Output
  - runPowerShellCommand.Output
  - runShellCommandMac.Output
parameters:
  AutomationAssumeRole:
    default: ''
    type: String
  InstanceId:
    description: (Required) EC2 InstanceId to run command
    type: String
mainSteps:

  - name: GetInstance
    action: 'aws:executeAwsApi'
    inputs:
      Service: ssm
      Api: DescribeInstanceInformation
      Filters:
        - Key: InstanceIds
          Values:
            - '{{ InstanceId }}'
    outputs:
      - Name: myInstance
        Selector: '$.InstanceInformationList[0].InstanceId'
        Type: String
      - Name: platform
        Selector: '$.InstanceInformationList[0].PlatformType'
        Type: String
      - Name: PingStatus
        Selector: '$.InstanceInformationList[0].PingStatus'
        Type: String
      - Name: ResourceType
        Selector: '$.InstanceInformationList[0].ResourceType'
        Type: String
  - name: SelectOnlineInstances
    action: 'aws:branch'
    isEnd: true
    inputs:
      Choices:
        - And:
            - Variable: '{{GetInstance.PingStatus}}'
              StringEquals: Online
            - Variable: '{{GetInstance.ResourceType}}'
              StringEquals: EC2Instance
          NextStep: ChoosePlatform
  - name: ChoosePlatform
    action: 'aws:branch'
    isEnd: true
    inputs:
      Choices:
        - NextStep: runPowerShellCommand
          Variable: '{{GetInstance.platform}}'
          StringEquals: Windows
        - NextStep: runShellCommandLinux
          Variable: '{{GetInstance.platform}}'
          StringEquals: Linux
        - NextStep: runShellCommandMac
          Variable: '{{GetInstance.platform}}'
          StringEquals: MacOS
  - name: runShellCommandLinux
    action: 'aws:runCommand'
    isEnd: true
    inputs:
      DocumentName: AWS-RunShellScript
      InstanceIds:
        - '{{GetInstance.myInstance}}'
      Parameters:
        commands:
          - echo "hello from linux"
  - name: runPowerShellCommand
    action: 'aws:runCommand'
    isEnd: true
    inputs:
      DocumentName: AWS-RunPowerShellScript
      InstanceIds:
        - '{{GetInstance.myInstance}}'
      Parameters:
        commands:
          - Write-Output "hello from windows"
  - name: runShellCommandMac
    action: 'aws:runCommand'
    isEnd: true
    inputs:
      DocumentName: AWS-RunShellScript
      InstanceIds:
        - '{{GetInstance.myInstance}}'
      Parameters:
        commands:
          - echo "hello from mac"
answered 7 months ago
  • This is much better, because my automation was still timing out and marked as failed on instances where the SSM agent wasn't running(a common issue on Windows).

1

I still haven't found a way to filter my targets to only include running instances, so I thought I'd share the work-a-round I am using in case it helps anyone else:

As I explained in OP, I already had the first step of my Automation checking instance status using the runbook action aws:assertAwsResourceProperty to call the DescribeInstanceStatus API and assert the answer is "running". Unfortunately, the way aws:assertAwsResourceProperty works, if response doesn't match the value you specify, it fails. I was unhappy with my executions being marked as failures in cases where instances were not running.

A colleague clued me in to a trick with aws:branch steps where you can set isEnd to true and provide no Default step, that essentially allows you to exit gracefully. I had to abandon assertAwsResourceProperty and make the DescribeInstanceStatus call directly in an aws:executeAwsApi step, so I could capture and evaluate the output myself in such a branch step. It looked like this:

...
"mainSteps": [
              {
                "name": "GetInstanceStatus",
                "action":"aws:executeAwsApi",
                "inputs": {
                  "Service": "ec2",
                  "Api": "DescribeInstanceStatus",
                  "InstanceIds": [
                    "{{InstanceId}}"
                  ]
                 },
                 "outputs":[
                   {
                     "Name":"InstanceState",
                     "Selector": "$.InstanceStatuses[0].InstanceState.Name",
                     "Type": "String"
                   }
                 ]
              },
              {
                "name": "CheckInstanceStatus",
                "action": "aws:branch",
                "isEnd": true,
                "inputs": {
                  "Choices": [
                    {
                      "NextStep": "TheRestOfMyRunbook",
                      "Variable": "{{GetInstanceStatus.InstanceState}}",
                      "StringEquals": "running"
                    }
                  ]
                }
              },
              {
                "name": "TheRestOfMyRunbook",
                ...
              },
              ...

Now, when a State Manager Association, or Event-driven Lambda, or whatever trigger, feeds my runbook a stopped instance, it exits at the second step and marks my execution as successful.

Another possible feature, that I often find myself reaching for and isn't there, that might have eased this would be an option for onFailure to "Skip" or "End". Some way to simply get out of a Runbook without Aborting and showing a failure.

answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions