How do I troubleshoot inconsistent CloudWatch metric values, data gaps, and graph discrepancies?
The metric values in my Amazon CloudWatch graphs are inconsistent. My metrics don't appear on the CloudWatch console, and my alarm notifications show values that don't match my current graphs.
Short description
CloudWatch displays metrics according to data retention policies and period settings.
Metrics might appear inconsistent or missing in the following situations:
- You terminate Amazon Elastic Compute Cloud (Amazon EC2) instances and the Amazon EC2 instances no longer publish data.
- You view data that's older than 15 days with 1-minute periods.
- AWS services backfill metric data after alarm evaluation.
- SEARCH expressions don't include the required dimensions.
- You remove metrics from the CloudWatch agent configuration.
Resolution
Retrieve historical metrics from terminated EC2 instances
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.
Instance metrics don't appear on the CloudWatch console or in the output of the list-metrics AWS CLI command because terminated instances don't publish new data points. If an instance doesn't publish new data points within 14 consecutive days, then CloudWatch removes metrics from the console and list-metrics output. However, CloudWatch stores the underlying data for the full retention period up to 15 months.
To retrieve historical data from terminated instances, run the following get-metric-data command with the instance ID:
aws cloudwatch get-metric-data \ --metric-data-queries '[{"Id":"m1","MetricStat":{"Metric":{"Namespace":"AWS/EC2","MetricName":"METRIC-NAME","Dimensions":[{"Name":"InstanceId","Value":"INSTANCE-ID"}]},"Period":300,"Stat":"Average"}}]' \ --start-time START-TIME \ --end-time END-TIME
Note: Replace METRIC NAME with the name of the metric that you want to retrieve, such as CPUUtilization. Replace INSTANCE-ID with the terminated instance ID. Replace START-TIME with the start of your query range and END-TIME with the end of your query range.
Match metric periods to data age
CloudWatch might display a flat line when you view a week-long range but show expected spikes when you view a single day. This occurs because CloudWatch retains metric data at different granularities based on age. CloudWatch stores high-resolution custom metrics for 3 hours, 1-minute data points for 15 days, 5-minute data points for 63 days, and 1-hour data points for 455 days or 15 months.
After 15 days, CloudWatch aggregates 1-minute data and makes it available only at a 5-minute resolution. When you query data that's older than 15 days with a 1-minute period, CloudWatch returns no results because the granularity no longer exists. After 63 days, CloudWatch aggregates 5-minute data and makes it available only at a 1-hour resolution. Not all AWS services publish metrics at 1-minute intervals. Some services publish metrics only at 5-minute intervals.
To resolve this issue, match the period to the data age. Use 1-minute periods for data within 15 days, 5-minute periods for data between 15 and 63 days, and 1-hour periods for data beyond 63 days.
To retrieve data at the correct period, run the following get-metric-data command:
aws cloudwatch get-metric-data \ --metric-data-queries '[{"Id":"m1","MetricStat":{"Metric":{"Namespace":"YOUR-NAMESPACE","MetricName":"YOUR-METRIC-NAME"},"Period":300,"Stat":"Sum"}}]' \ --start-time START-TIME \ --end-time END-TIME
Note: Replace YOUR-NAMESPACE with your namespace and YOUR-METRIC-NAME with your metric name. Adjust the Period value based on the data age. Use 60 for data within 15 days, 300 for data between 15 and 63 days, or 3600 for data beyond 63 days.
When CloudWatch aggregates data points over longer periods, the method depends on the selected statistic. No data is lost during aggregation.
Reduce discrepancies between alarm values and graph values
An alarm might activate and show a specific value, but the graph later shows a different value for the same timestamp. This is a normal occurrence for AWS service and custom metrics when delayed data points arrive after the alarm already evaluated.
For example, at 10:00 UTC a service publishes a value of 5 to a metric. The alarm evaluates and records the value as 5. At 10:03 UTC, the same service publishes additional data points, 4 and 3, to the same 10:00 UTC timestamp. The Average statistic recalculates from 5 to 4. The alarm history shows 5, but the console graph now shows 4.
After the delayed data points arrive, the metric graph shows a value that's based on the statistic. The value is higher or lower than the one that the alarm used during the evaluation. Alarms then appear activated when the metric didn't cross the threshold, or the opposite.
To check whether the data point value activated the alarm, complete the following steps:
- Open the CloudWatch console.
- Choose Alarms, and then select your alarm.
- Choose the History tab.
- Review the State update entry. In the New state reason field, find the metric value that CloudWatch used at the time of evaluation.
To reduce discrepancies between alarm values and graph values, match the alarm period to the service's metric publishing interval. For example, Application Load Balancer publishes metrics every minute, so you use 60-second periods. For other services, check the service documentation for publishing intervals.
Configure evaluation settings based on the following monitoring requirements:
- If you have high sensitivity requirements, then evaluate 1 datapoint within 1 evaluation period.
- If you have balanced requirements, then evaluate 2 datapoints within 3 evaluation periods.
- If you have conservative requirements, then evaluate 3 datapoints within 5 evaluation periods.
To reduce discrepancies between alarm values and graph values that metric backfills cause, use a higher number of evaluation periods. For example, evaluate 2 datapoints within 3 evaluation periods instead of 1 datapoint within 1 evaluation period.
Note: CloudWatch doesn't show intervals that are shorter than the service's publishing interval. For higher resolution analysis, use service-specific logs, such as Application Load Balancer access logs or AWS Lambda function logs.
Add metric dimensions to the SEARCH expression
The get-metric-data command might return empty results when you query metrics with a SEARCH expression. If the SEARCH expression doesn't include required metric dimensions, then CloudWatch can't match the query to existing metrics.
To resolve this issue, use the CloudWatch console or AWS CLI to identify the required dimensions. Then, add the required dimensions to your SEARCH expression.
Use the CloudWatch console
Complete the following steps:
- Open the CloudWatch console.
- Choose Metrics, and then select your namespace.
- Choose Graphed metrics, and then choose Details.
Use the AWS CLI
Run the following list-metrics command:
aws cloudwatch list-metrics --namespace NAMESPACE --region REGION
Note: Replace NAMESPACE with your namespace and REGION with your AWS Region.
The output shows the metric name and all dimensions for each metric, such as WorkGroup and QueryType. Also, note the dimension's values.
Include dimensions in your SEARCH expression
Note: In the following commands, replace NAMESPACE with your namespace. Replace DIMENSION-NAMES with the comma-separated dimension names, for example WorkGroup,QueryType. Replace METRIC-NAME with the name of your metric and REGION with your Region. Replace STATISTIC with the statistic to retrieve, such as Sum or Average.
Run the following get-metric-data command to include the dimensions in your SEARCH expression:
aws cloudwatch get-metric-data --metric-data-queries '[{"Id":"m1","Expression":"SEARCH('{NAMESPACE,DIMENSION-NAMES} YOUR-METRIC-NAME','Average',86400)","Period":86400}]' --start-time START-TIME --end-time END-TIME --region REGION
If the SEARCH expression continues to return empty results, then run the following get-metric-data command to query the specific metric with all dimensions:
aws cloudwatch get-metric-data \ --metric-data-queries '[{"Id": "m1", "MetricStat": {"Metric": {"Namespace": "NAMESPACE", "MetricName": "METRIC-NAME", "Dimensions": [{"Name": "DIMENSION-NAMES", "Value": "DIMENSION-VALUE"}]}, "Period": 86400, "Stat": "STATISTIC"}}]' \ --start-time START-TIME \ --end-time END-TIME \ --region REGION
Confirm that the service was active during the queried time range. If the service wasn't active, then CloudWatch has no metric data to return. Allow up to 15 minutes for CloudWatch to display newly generated metrics. Also, confirm that the dates are in ISO 8601 format and aren't future dates.
Restore metrics after configuration changes to the CloudWatch agent
To restore metric collection, confirm that the agent configuration file includes the metrics that you want to collect in the metrics_collected section. After you update the configuration file, restart the agent with the fetch-config command to apply the changes.
Related information
Using Amazon CloudWatch alarms
Why did my CloudWatch alarm initiate when the monitored metric doesn't have breaching datapoints?
- Language
- English

Relevant content
- Accepted Answer
- asked 3 years ago