Support Automation Workflow (SAW) Runbook: Troubleshoot Amazon CloudWatch Agent
How can I use the AWSSupport-TroubleshootCloudWatchAgent automation runbook to troubleshoot Amazon CloudWatch Agent, or to collect the required support logs from the instance?
In this article, I will show you how to use the AWSSupport-TroubleshootCloudWatchAgent, AWS Systems Manager automation runbook to troubleshoot common issues with the Amazon CloudWatch Agent running on an Amazon EC2 instance. This automation runbook can be used to collect, analyze, and upload logs and other configuration files to Amazon Simple Storage Service (Amazon S3) bucket of your choice.
Learn more about Support Automation Workflows >>
How it works?
The runbook AWSSupport-TroubleshootCloudWatchAgent contains a series of basic, and advanced checks. The basic checks are compatible with all Linux and Windows operating systems, whereas the advanced checks are compatible with the same operating systems, but require that the Amazon EC2 instance is managed by AWS Systems Manager.
The basic checks include the following troubleshooting steps:
- Checks the selected Amazon EC2 instance for an attached AWS Identity and Access Management (IAM) instance profile.
- Verifies the required permissions policies are applied to the IAM instance profile.
The advanced checks extend upon the basic checks above, adding the following additional features:
- Checks the status of the Amazon CloudWatch Agent process on your Amazon EC2 instance.
- Analyzes the Amazon CloudWatch Agent logs for any errors, and provides targeted guidance to resolve any indicated issues.
- Bundles and zips any relevant configuration files and the Amazon CloudWatch Agent logs on the instance.
Note 1: An optional input runbook parameter S3UploadBucket is provided to give you the ability to upload these logs directly from your Amazon EC2 instance to an Amazon S3 bucket. If this parameter is given, it will cleanup the created zip file after upload.
- Performs connection tests on the instance to the required service endpoints.
Note 2: This test targets the Amazon CloudWatch Logs and Amazon CloudWatch endpoints by default. If your CloudWatch Agent configuration file uses the append_dimensions option, we recommend that you set the runbook input parameter CheckEC2Endpoint to
true
. This ensures that a connection test to the Amazon EC2 endpoint is performed, as this is required for the aforementioned configuration directive. Additionally, if the runbook input parameter RunVpcReachabilityAnalyzer is set totrue
, and no endpoint connection failures are detected from this step, the automation will skip AnalyzeAWSEndpointReachabilityFromEC2 step to save you from incurring any costs.
When the runbook input parameter RunVpcReachabilityAnalyzer is set to true
, it will run AnalyzeAWSEndpointReachabilityFromEC2 step if one of two criteria are satisfied:
- The instance is not AWS Systems Manager managed instance.
- The instance is AWS Systems Manager managed instance, and one of the connectivity checks between the instance and the required endpoints resulted in a
failed
message output.
If either of this occur, the runbook can help you troubleshoot DNS issues, Amazon Virtual Private Cloud (Amazon VPC) endpoint connectivity, or to verify reachability from your Amazon EC2 to outbound networking resources.
Note 3: AnalyzeAWSEndpointReachabilityFromEC2 does not directly perform any connectivity checks to the required endpoints. Instead, it uses Reachability Analyzer which contains an associated cost. For this reason, if you have a complex Amazon VPC routing setup that includes peering connections or resources such as transit gateways, this may not provide an accurate assessment of the reachability to the specified AWS endpoints. For pricing details of Reachability Analyzer, please see Amazon VPC Pricing.
Required IAM permissions
The AutomationAssumeRole parameter requires the following actions to successfully use the runbook:
iam:SimulatePrincipalPolicy
iam:GetContextKeysForPrincipalPolicy
iam:GetInstanceProfile
iam:ListAttachedRolePolicies
iam:PassRole
s3:GetBucketAcl
s3:PutObject
s3:GetBucketPolicyStatus
s3:ListBucket
ssm:GetAutomationExecution
ssm:GetCommandInvocation
ssm:DescribeInstanceInformation
ec2:DescribeInstances
To run AnalyzeAWSEndpointReachabilityFromEC2 step, make sure your IAM user or the role has the permissions listed in the AnalyzeAWSEndpointReachabilityFromEC2 required IAM permissions section.
Instructions
- Navigate to the AWSSupport-TroubleshootCloudWatchAgent in the AWS Systems Manager console.
- Click on Execute automation.
- For the input parameters enter the following:
- AutomationAssumeRole (optional): This is the Amazon Resource Name (ARN) of the AWS Identity and Access Management (IAM) role that allows Systems Manager Automation to perform the actions on your behalf. If no role is specified, Systems Manager Automation will use the permissions of the user that starts this runbook.
- InstanceId (required): The ID of the Amazon EC2 instance you want to troubleshoot the Amazon CloudWatch Agent on.
- S3UploadBucket (optional): The name of an Amazon S3 bucket to upload the collected Amazon CloudWatch Agent logs. The Amazon EC2 instance profile must have correct permissions to upload files to this bucket. This also requires the target Amazon EC2 instance to be AWS Systems Manager managed instance.
- S3BucketOwnerAccountId (optional): The AWS Account Number that owns the Amazon S3 bucket where you want to upload the Amazon CloudWatch Agent logs. If you do not modify this parameter, the runbooks uses the AWS account ID of the user or role in which the automation runs.
- CheckEC2Endpoint (optional): Specify
true
if your agent configuration uses the optionappend_dimensions
to append Amazon EC2 metric dimensions to the metrics collected by the agent. Whenappend_dimensions
is used, the Amazon CloudWatch Agent requires connectivity to the Amazon EC2 endpoint, so an additional connectivity tests will be performed via the extended checks. Default value isfalse
. - RunVpcReachabilityAnalyzer (optional): Specify
true
to run the AnalyzeAWSEndpointReachabilityFromEC2 automation step if a network issue is determined by the extended checks, or if the instance ID specified is not a managed instance. Default value isfalse
. - RetainVpcReachabilityAnalysis (optional): Only relevant if RunVpcReachabilityAnalyzer is
true
. Specifytrue
to retain the network insight path and related analyses created by VPC Reachability Analyzer. By default, those resources are deleted after successful analysis. If you choose to retain the analysis, the you can visualize it in the VPC console. The console link will be available in the AnalyzeAWSEndpointReachabilityFromEC2 automation output. Default value isfalse
.
The following example demonstrates how to use the AWSSupport-TroubleshootCloudWatchAgent automation runbook to troubleshoot why your Amazon CloudWatch Agent is not running correctly.
- Click on Execute.
- You should see that the automation has been initiated.
- Document will perform the following steps:
getInstanceProfile
: Verifies if the provided Amazon EC2 instance has an IAM instance profile attached.branchOnInstanceProfileStatus
: Branches the automation to check for necessary instance profile permissions if the instance profile is attached to the instance.verifyIamPermissions
: Checks the instance profile associated with the instance to determine if the necessary permissions are applied.getInstanceInformation
: Checks if the instance has an active AWS Systems Manager agent, and fetches the OS type of the instance.branchOnManagedInstance
: Branches the automation to perform extended checks if the instance is managed.getAgentStatus
: Checks the status of the Amazon CloudWatch Agent on the instance.branchOnInstanceOsType
: Branches the automation to run a specific log collection/analysis command based on the OS.analyzeLogs
: Analyzes and outputs findings of Amazon CloudWatch Agent logs on Linux OS.collectLogs
: Bundles and outputs the relevant Amazon CloudWatch Agent troubleshooting files on Linux OS.checkEndpointReachability
: Checks if the instance can reach the required endpoints on Linux OS.analyzeLogsWindows
: Analyzes and outputs findings of Amazon CloudWatch Agent logs on Windows OS.collectLogsWindows
: Bundles and outputs the relevant Amazon CloudWatch Agent troubleshooting files on Windows OS.checkEndpointReachabilityWindows
: Checks if the instance can reach the required endpoints on Windows OS.branchOnRunVpcReachabilityAnalyzer
: Branches the automation to run a specific log collection/analysis command based on the OS.generateEndpoints
: Generates an endpoint to check from the extended check failures and the value of CheckEC2EndpointanalyzeAwsEndpointReachabilityFromEC2
: Calls the automation runbook AWSSupport-AnalyzeAWSEndpointReachabilityFromEC2 to check the reachability of the selected instance to the required endpoints.outputFindings
: Output results of the automation execution steps.
- Once completed, you can review the Outputs section for the detailed results of the execution:
If the AnalyzeAWSEndpointReachabilityFromEC2 step is ran, you will see a URL for the referenced automation output in the execution Outputs section. Navigate to this URL to see further details of this referenced automation.
Output when input parameter RunVpcReachabilityAnalyzer is set to true
.
Output when one of the endpoint requests in the checkEndpointReachability resulted in a connection failure in a subsequent execution of the runbook. See Note 2 above for more information on when this step is ran.
Conclusion
In this article, I demonstrated how to troubleshoot Amazon CloudWatch agent issues on your Amazon EC2 instance using the automation runbook AWSSupport-TroubleshootCloudWatchAgent, available in the AWS System Manager.
References
Systems Manager Automation
Running a simple automation: https://docs.aws.amazon.com/systems-manager/latest/userguide/automation-working-executing.html
Setting up Automation: https://docs.aws.amazon.com/systems-manager/latest/userguide/automation-setup.html
Documentation related to the AWS service
Use the following information to help troubleshoot problems with the CloudWatch agent.
To help you troubleshoot, remediate, manage, and reduce costs on your AWS resources, AWS Support maintains a subset of the AWS provided predefined runbooks . These runbooks are prefixed with “AWSSupport-“ or “AWSPremiumSupport-“.
Relevant content
- asked 9 months agolg...
- Accepted Answerasked a month agolg...
- asked 4 years agolg...
- AWS OFFICIALUpdated 5 months ago
- AWS OFFICIALUpdated 2 months ago