Complete a 3 Question Survey and Earn a re:Post Badge
Help improve AWS Support Official channel in re:Post and share your experience - complete a quick three-question survey to earn a re:Post badge!
Optimizing CloudWatch Container Insights Costs: A Comprehensive Guide
This guide focuses specifically on optimizing Container Insights application logs costs, which represent one component of your EKS cluster's observability costs. While a typical EKS deployment includes multiple logging components, we'll concentrate on Container Insights application logs optimization techniques while providing references for optimizing other components.
Table of Contents
- Introduction
- Prerequisites
- Understanding Your Container Insights and EKS Logging Costs
- Scope and Architecture
- Log Optimization Steps
- Monitoring and Maintenance
- Best Practices
- Conclusion
- Resources
Introduction
Amazon CloudWatch Container Insights helps you monitor and troubleshoot containerized applications and microservices. However, without proper optimization, costs can grow significantly. In this post, we'll explore how to analyze and optimize Container Insights costs effectively using a practical example.
Prerequisites
- An existing EKS cluster with Container Insights enabled
- AWS CLI configured with appropriate permissions
- Basic understanding of Kubernetes and CloudWatch
- Access to modify Fluent Bit configurations
Understanding Your Container Insights and EKS Logging Costs
Before diving into optimization strategies, it's crucial to understand your current cost distribution across different observability components.
To analyze your costs, first follow the setup instructions in Using AWS Cost and Usage Reports with Athena to create and query your Cost and Usage Reports.
Once set up, use this query to break down your Container Insights and EKS logging costs by purpose:
SELECT line_item_resource_id AS ResourceID, line_item_operation AS Operation, CASE WHEN line_item_resource_id LIKE '%/aws/eks/%/core-containers%' THEN 'Control Plane Logs' WHEN line_item_resource_id LIKE '%/aws/eks/%/containers%' THEN 'Platform Container Logs' WHEN line_item_resource_id LIKE '%/aws/eks/%/cluster%' THEN 'Cluster Level Logs' WHEN line_item_resource_id LIKE '%/aws/containerinsights/%/performance%' THEN 'Container Insights Performance Metrics' WHEN line_item_resource_id LIKE '%/aws/containerinsights/%/prometheus%' THEN 'Prometheus Metrics' WHEN line_item_resource_id LIKE '%/aws/containerinsights/%/application%' THEN 'Container Insights Application Logs' WHEN line_item_resource_id = '' THEN 'EMF Metrics Storage' ELSE 'Other' END AS Purpose, SUM(CAST(line_item_unblended_cost AS decimal(16,8))) AS TotalSpend FROM costandusagereport WHERE product_product_name = 'AmazonCloudWatch' AND line_item_usage_account_id = '123456789123' -- Replace with your account ID AND line_item_operation IN ( 'MetricStorage:AWS/Logs-EMF', -- Embedded Metrics 'PutLogEvents', -- Logs Ingestion 'HourlyStorageMetering' -- Logs Storage ) AND line_item_line_item_type NOT IN ('Tax','Credit','Refund','EdpDiscount','Fee','RIFee') AND ( line_item_resource_id LIKE '%log-group:/aws/containerinsights%' OR line_item_resource_id LIKE '%log-group:/aws/eks%' OR line_item_resource_id ='' ) GROUP BY line_item_resource_id, line_item_operation ORDER BY TotalSpend DESC
Example results showing typical cost patterns:
ResourceID | Operation | Purpose | TotalSpend |
---|---|---|---|
/aws/containerinsights/cluster-prod/application | PutLogEvents | Application Logs | 450.36 |
MetricStorage:AWS/Logs-EMF | Performance Metrics | 266.7 | |
/aws/eks/cluster-prod/containers | PutLogEvents | Platform Logs | 131.49 |
/aws/containerinsights/cluster-prod/prometheus | PutLogEvents | Prometheus Metrics | 98.75 |
/aws/eks/cluster-prod/core-containers | PutLogEvents | Control Plane Logs | 45.6 |
Scope and Architecture
Kubernetes Cluster
├── CloudWatch Agent DaemonSet
│ ├── Performance Metrics
│ │ └── /aws/containerinsights/*/performance
│ │ (EMF metrics - essential monitoring, no optimization) [Article: No]
│ │
│ └── Prometheus Metrics
│ └── /aws/containerinsights/*/prometheus
│ (Custom metrics - review collection settings) [Article: No]
│
└── Fluent Bit DaemonSet
├── Container Insights Logs
│ └── /aws/containerinsights/*/application
│ (Primary optimization target - filtering, sampling) [Article: Yes]
│
└── EKS Logs
├── Platform Logs
│ └── /aws/eks/*/containers
│ (Highest cost - consider selective logging) [Article: No]
│
├── Control Plane Logs
│ └── /aws/eks/*/core-containers
│ (Critical system logs - keep full logging) [Article: No]
│
└── Cluster Logs
└── /aws/eks/*/cluster
(Cluster-level events - keep full logging) [Article: No]
Based on the cost analysis from our Athena query, you can see various components contributing to your EKS observability costs. This guide focuses specifically on optimizing Container Insights application logs (marked as [Article: Yes] above).
For other components, including EKS platform and control plane logs optimization, see the observability cost optimization section of the Amazon EKS Best Practice Guide.
Our optimization techniques will focus on the Fluent Bit configuration for Container Insights application logs, where we can achieve significant cost savings (up to 96.5% log volume reduction) while maintaining observability.
Log Optimization Steps
Now that we understand which components we're targeting, let's explore four progressive optimization steps for Container Insights application logs using Fluent Bit configuration:
- Configure Log Filtering
- Implement Log Level Filtering
- Implement Log Sampling
- Optimize Batch Processing
Step 1: Configure Log Filtering
Purpose: Exclude logs from specific namespaces to reduce log volume.
apiVersion: v1 kind: ConfigMap metadata: name: fluent-bit-config data: fluent-bit.conf: | [FILTER] Name kubernetes Match kube.* K8S-Logging.Exclude On # Add these namespaces to exclude their logs K8S-Logging.Exclude_Namespaces kube-system monitoring ingress-nginx
Impact: Reduces log volume by excluding system and monitoring namespaces
- Before: 50GB/month
- After: 35GB/month (-30%)
Step 2: Implement Log Level Filtering
Purpose: Remove less critical log entries (INFO and DEBUG) to focus on important logs. Add this filter to your existing configuration:
[FILTER]
Name grep
Match kube.*
# This will exclude logs containing INFO or DEBUG
Exclude log INFO|DEBUG
Impact: Reduces remaining log volume by filtering out INFO and DEBUG logs
- Before: 35GB/month
- After: 17.5GB/month (-50%)
Step 3: Implement Log Sampling
Purpose: Sample a percentage of the remaining logs to further reduce volume while maintaining representative data.
Add this filter to your existing configuration:
[FILTER]
Name sample
Match kube.*
Rate 10 # Sample 10% of logs
Impact: Reduces log volume by sampling only 10% of logs
- Before: 17.5GB/month
- After: 1.75GB/month (-90%)
Step 4: Optimize Batch Processing
Purpose: Optimize how logs are sent to CloudWatch to reduce API costs and improve performance. Add these parameters to your CloudWatch output configuration:
[OUTPUT]
Name cloudwatch_logs
Match kube.*
region ${AWS_REGION}
log_group_name /aws/containerinsights/${CLUSTER_NAME}/application
# Increase batch size to reduce API calls
batch_size 10000 # Default is 1000
# Wait longer to collect more logs in each batch
batch_timeout 60 # Default is 30
# Set retention period to manage storage costs
retention_days 14 # Default is never expire
Impact: While this step doesn't directly reduce log volume, it:
- Reduces API calls by batching more logs together
- Manages storage costs through retention policies
- Improves overall performance
Complete configuration example:
apiVersion: v1 kind: ConfigMap metadata: name: fluent-bit-config data: fluent-bit.conf: | [FILTER] Name kubernetes Match kube.* K8S-Logging.Exclude On K8S-Logging.Exclude_Namespaces kube-system monitoring ingress-nginx [FILTER] Name grep Match kube.* Exclude log INFO|DEBUG [FILTER] Name sample Match kube.* Rate 10 [OUTPUT] Name cloudwatch_logs Match kube.* region ${AWS_REGION} log_group_name /aws/containerinsights/${CLUSTER_NAME}/application batch_size 10000 batch_timeout 60 retention_days 14
Cost Impact Analysis
Let's analyze the impact of these optimizations:
Cost Reduction Journey
Initial State ($450.36 - Application Logs PutLogEvents)
│
├── Step 1: Namespace Exclusion
│ └── Logs: -30% (-$135.11/month)
│ └── New cost: $315.25
│
├──├── Step 2: Log Level Filtering
│ └── Logs: -50% (-$157.62/month)
│ └── New cost: $157.63
│
├── Step 3: Sampling (10%)
│ └── Logs: -90% (-$141.87/month)
│ └── New cost: $15.76
│
└── Step 4: Batch Processing
└── Reduced API costs through batching
Final Cost: $15.76/month for application logs
EMF Metrics: $266.70/month (unchanged)
Total Monthly Savings on Application Logs: $434.60 (96.5% reduction)
Note: EMF Metrics costs ($266.70/month) remain unchanged in our optimization scenario because these metrics are extracted from the /aws/containerinsights/*/performance logs, which are not the target of our optimization steps.
Monitoring and Maintenance
After implementing these optimizations, it's crucial to monitor both cost effectiveness and system observability.
Cost Monitoring
Monitor these CloudWatch metrics:
- IncomingBytes and IncomingLogEvents for log volume
- ResourceCount for Container Insights metrics
Observability Monitoring
To ensure optimization hasn't impacted your observability, monitor these aspects:
- Critical Event Detection
- Monitor incident detection time
- Track error visibility
- Verify critical error logging
- Application Health Monitoring
- Application performance metrics
- Service-level indicators (SLIs)
- Business event tracking
- Infrastructure Visibility
- Container restart monitoring
- Node health checks
- Service availability metrics
Example alert for monitoring logging effectiveness:
# Example: Alert if error logs drop significantly (might indicate logging issues) aws cloudwatch put-metric-alarm \ --alarm-name error-logs-missing \ --metric-name IncomingLogEvents \ --namespace AWS/Logs \ --statistic Sum \ --period 3600 \ --threshold 10 \ --comparison-operator LessThanThreshold \ --evaluation-periods 1 \ --alarm-actions ${SNS_TOPIC_ARN}
Best Practices and Recommendations
1. Implementation Strategy
- Start with Non-Production
- Test configurations in development environments first
- Validate impact on troubleshooting capabilities
- Document baseline metrics before changes
- Gradual Implementation
- Follow steps 1-4 in sequence
- Allow time between changes to assess impact
- Keep team informed of changes and expectations
- Documentation
- Record excluded namespaces and reasoning
- Document logging levels for each application
- Maintain change history and impact assessments
2. Ongoing Optimization
Regular Monitoring
Weekly Tasks:
├── Review error rates in sampled logs
├── Check for missing critical events
└── Validate batch processing performance
Monthly Tasks:
├── Analyze cost trends
├── Review namespace exclusions
└── Adjust sampling rates if needed
Quarterly Tasks:
├── Full cost-benefit analysis
├── Update retention policies
└── Review overall observability effectiveness
3. Observability Balance
- Critical Systems
- Keep ERROR and WARN logs unsampled
- Maintain full logging for security events
- Consider separate log groups for critical components
- Environment-Specific Settings
Production:
└── Conservative sampling (25-50%)
└── Retain all error logs
└── Full metrics collection
Staging:
└── Moderate sampling (10-25%)
└── Basic error logging
└── Selected metrics
Development:
└── Aggressive sampling (5-10%)
└── Minimal logging
└── Limited metrics
Storage Class Considerations
- Container Insights components (logs and EMF metrics) require Standard storage class
- IA tier storage is not supported for Container Insights
Conclusion
Our systematic approach to Container Insights optimization demonstrated significant cost savings while maintaining observability:
Key Achievements
Cost Reduction:
├── Total savings: $434.60/month (96.5%)
├── Application logs cost: $450.36 → $15.76
└── Log volume reduction: 96.5%
Maintained Capabilities:
├── Critical error detection
├── Performance monitoring
└── System troubleshooting
While performance metrics constitute a significant cost ($266.70/month), our log optimization strategy provided substantial savings with minimal operational impact. The key is finding the right balance between cost optimization and maintaining effective system observability.
Next Steps
- Implement monitoring dashboards
- Establish regular review cycles
- Document optimization results
- Plan for continuous improvement
Resources
Official Documentation
- Analyzing, optimizing, and reducing CloudWatch costs
- AWS CloudWatch Container Insights Metrics Documentation
- Fluent Bit Documentation for Container Insights
- CloudWatch Pricing
Additional Resources
Tools and Scripts
Relevant content
- AWS OFFICIALUpdated 3 months ago