Running SSM Commands after ec2 instance is created (Terraform)

0

I have created a Terraform module which creates a ec2 instance the module also creates aws_ssm_association resources which essentially runs a command.

My custom command runs just fine and is working perfectly this basically sets up docker swarm and deploys an application.

The problem I have is I need to install the AWS Cloudwatch Agent so I can send logs and metrics to cloudwatch.

I used the AWS-ConfigureAWSPackage document to install AmazonCloudWatchAgent this only works when I run it from the AWS Web Console when I run the same command with Terraform it fails.

I then used my instance's user_data script to install the AmazonCloudWatchAgent.

Then I tried to setup the cloudwatch agent using the AmazonCloudWatch-ManageAgent document. Again this works only from the AWS Web console and not from Terraform.

I get the following error: CloudWatch Agent not installed. Please install it using the AWS-ConfigureAWSPackage SSM Document.

When I log into the ec2 instance after creation the agent is installed and when I run the document from AWS Web console it works.

Is there some race condition here? Is Terraform trying to run these commands too fast after the ec2 instance was created?

I have no idea how to handle this, I would really prefer all config on the ec2 instance to be done with Systems Manger and not rely on using the user_data script.

Any ideas would be greatly appreciated.

已提问 2 个月前191 查看次数
3 回答
0
已接受的回答

Okay, So I found a solution to my problem. I used the terraform time_sleep resource, this allowed me to create delay of around a minute which gave the instance a chance to pass all systems checks then all the aws_ssm_association resources created successfully.

Creating the time_sleep resource:

# Sleep
# Wait 1 minute
resource "time_sleep" "wait_60_seconds" {
  depends_on = [aws_instance.ec2-instance]

  create_duration = "60s"
}

You can then depend on the time_sleep resource this will then create the aws_ssm_association resource after 1 minute.

# SSM Run command
# Configure Cloudwatch Agent
resource "aws_ssm_association" "cloudwatch-config" {

  name = "AmazonCloudWatch-ManageAgent"
  
  targets {
    key    = "InstanceIds"
    values = [aws_instance.ec2-instance.id]  # Use the correct instance ID from aws_instance
  }

  parameters = {
    action                      = "configure"
    mode                        = "ec2"
    optionalConfigurationSource = "ssm"
    optionalConfigurationLocation = "CWA_config"
    optionalRestart               = "yes"
  }

  depends_on = [ aws_ssm_association.cloudwatch-agent, time_sleep.wait_60_seconds ]

}
已回答 1 个月前
0

It appears that you're encountering a timing issue when trying to install and configure the CloudWatch agent using Terraform and AWS Systems Manager (SSM). This is likely due to a race condition between the EC2 instance creation, the SSM agent initialization, and the execution of your SSM commands.

Here are some suggestions to address this issue:

  1. Ensure proper IAM permissions: Make sure the IAM role associated with your EC2 instance has the necessary permissions, including the AmazonSSMManagedInstanceCore and CloudWatchAgentServerPolicy. This allows the instance to interact with Systems Manager and CloudWatch.

  2. Use a null_resource with a local-exec provisioner: Instead of relying solely on the aws_ssm_association resource, you can use a null_resource with a local-exec provisioner in your Terraform code. This allows you to add a delay before executing the SSM commands, giving the instance and SSM agent time to initialize fully.

  3. Implement a retry mechanism: You can create a script that attempts to run the SSM command multiple times with a delay between attempts. This can help overcome temporary unavailability of the SSM agent.

  4. Use SSM State Manager: Instead of running the commands immediately after instance creation, you can use SSM State Manager to schedule the execution of your SSM documents. This gives the instance more time to fully initialize before attempting to run the commands.

  5. Verify SSM agent status: Before running your SSM commands, you can add a step to check the status of the SSM agent on the instance. This ensures that the agent is ready to receive commands.

  6. Split the process: Consider separating the CloudWatch agent installation and configuration into two steps. First, ensure the agent is installed (either through user_data or an SSM command), and then use a separate SSM command to configure it.

  7. Use the latest SSM agent: Ensure that your EC2 instance is using the latest version of the SSM agent, as newer versions may have improved handling of race conditions.

By implementing one or more of these strategies, you should be able to reliably install and configure the CloudWatch agent using Terraform and Systems Manager, without relying solely on user_data scripts.
Sources
Install and configure the CloudWatch agent using the Amazon EC2 console to add additional metrics - Amazon Elastic Compute Cloud
Download, configure, and run the CloudWatch agent using SSM - Amazon CloudWatch
Resolve why SSM Agent doesn't start on EC2 Windows instance | AWS re:Post
Install and configure the unified CloudWatch Agent with SSM | AWS re:Post

profile picture
已回答 2 个月前
0

My approach is to install the cloudwatch agent during the intial boot via userdata. Would this work for you?

https://docs.aws.amazon.com/prescriptive-guidance/latest/implementing-logging-monitoring-cloudwatch/deploy-cloudwatch-agent-user-data-script.html

profile picture
专家
已回答 2 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则