- Newest
- Most votes
- Most comments
Normally, the names and namespaces of the metrics are the best indication of their sources. Metrics produced by AWS's standard services follow a systematic naming convention. Custom metrics might be named in whichever way, but in most cases, when the effort is made to produce custom metrics, due consideration is given to making them useful by structuring them with proper naming.
If you browse the namespace structure and the individual metrics underneath them simply in the CloudWatch metrics console, is there a small number of namespaces under which the bulk of the 700,000 metrics are located? Or is there a huge number of namespaces? If there is only a handful of namespaces, do their names reveal anything about their source and purpose? If the number of namespaces is large, is there a pattern to them, such as essentially the same namespace being duplicated for a large number of servers or applications?
Hi, building on what the experts already shared, I’d like to draw your attention to a possible help in analyzing your metrics and provide a few more specific answers to your questions.
Source identification: as pointed out by the experts, your metrics are grouped by namespace. They pointed out a question, that maybe you have a specific namespace with a high concentration of metrics. This may happen for example when you have a very high number of dimension values, as every dimension value creates a distinct metric. If that were your case, or if you need a way to quickly drill down in a huge block of metrics in your account, have you seen the following blog post? https://aws.amazon.com/blogs/mt/analyzing-your-custom-metrics-spend-contributors-in-amazon-cloudwatch/ - the solution in that blog post is designed explicitly to find out immediately the biggest contributors in number of metrics.
Data retention: The metrics retention is fixed, there’s a 15 month retention. A metric disappears from the UI list after not having been sent data to for 2 weeks, but the metric data remains queryable using the API or by specifying manually the name of the metric in the source of a dashboard widget. However please note that the retention does not induce any new costs. You only need archiving if you want to keep metrics data for more than 15 months.
Cost: you incur spend only when you ingest metrics values. The way to reduce costs is to identify the source and what’s driving a high number of metrics. If it’s a dimension that has too many values, and you don’t need that level of granularity, you need to change the way the metric is emitted to reduce the number of dimensions. An example is if you emit a metric with 4xx error count per API endpoint and per session - the session id would have a lot of values, creating a lot of metrics. Maybe you don’t need that level of details as a metric, and you only need a metric with the 4xx error count per API endpoint - by removing the session id from the dimensions when you send the metrics data, you would greatly reduce the number of metrics and drive down costs. If you are creating a metric using a metric filter with a dimension whose values come from the logs and that’s too many values, you could either use a less verbose dimension or create a Contributor Insight rule to analyze the data instead of creating a metric. Those are just two examples, I hope you find them useful, or the blog post can help you. There’s also a dedicated documentation page at https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_billing.html
Certainly, I can help you address the challenges you're facing with your large volume of AWS CloudWatch metrics. Let's break this down into the areas you've mentioned:
-
Source Identification: To identify the sources of your 700,000 metrics, you can use AWS Cost Explorer and AWS Cost and Usage Reports integrated with Amazon Athena. These tools can help you analyze your CloudWatch usage data and break down the metrics by service, instance, or process. You can also use the CloudWatch console to view metrics by namespace, which can give you insights into which AWS services are generating the most metrics.
-
Data Retention and Archiving: For managing data retention and archiving, consider adjusting your retention settings in CloudWatch. By default, CloudWatch Metrics are stored for 15 months, but you can modify this. For metrics that don't require long-term storage, reducing the retention period can help manage costs. For metrics that need to be kept for compliance or long-term analysis, consider using CloudWatch Logs Insights or exporting the data to Amazon S3 for more cost-effective long-term storage.
-
Cost Optimization: To optimize costs with such a high volume of metrics:
- Remove unused or unnecessary metrics and alarms.
- Use the CloudWatch embedded metric format instead of calling the PutMetricData API directly, which can be more cost-effective for high-cardinality data.
- Batch your data when sending metrics to CloudWatch.
- Avoid using high-cardinality dimensions in your metrics.
- Consider using metric streams to send only necessary metrics to other AWS services or external monitoring tools.
- Alternative Storage Solutions: For non-essential metrics or those that don't require real-time access:
- Use the CloudWatch embedded metric format to ingest high-cardinality application data as logs. This allows you to generate actionable metrics while reducing costs associated with the PutMetricData API.
- Consider using Amazon S3 for long-term storage of historical metric data. You can use AWS Glue and Amazon Athena to query and analyze this data when needed.
- For custom-developed applications, work with your developers to implement important metrics using the CloudWatch embedded metric format, which can help reduce costs and simplify metric collection, especially for Lambda functions and containers.
Remember to regularly review and optimize your CloudWatch usage. Set up proper tagging for your resources to make it easier to identify and manage metrics associated with specific applications or teams. Also, consider using AWS Systems Manager for automating the deployment and management of the CloudWatch agent across your EC2 instances and on-premises servers.
By implementing these strategies, you should be able to better understand the sources of your metrics, manage your data more effectively, optimize costs, and find appropriate storage solutions for your CloudWatch metrics.
Sources
Analyzing, optimizing, and reducing CloudWatch costs - Amazon CloudWatch
Observability - AWS Cloud Adoption Framework: Operations Perspective
Application-level CloudWatch configuration - AWS Prescriptive Guidance
PERF02-BP03 Collect compute-related metrics - AWS Well-Architected Framework
Relevant content
- AWS OFFICIALUpdated 5 months ago
- AWS OFFICIALUpdated 6 months ago
- AWS OFFICIALUpdated 10 months ago
- AWS OFFICIALUpdated 5 months ago
You can’t manually delete metrics even though the AI says so. 15 months they are retained for