- Newest
- Most votes
- Most comments
To investigate the cost increase of PostgreSQL Aurora related to performance insights, cost explorer, and CloudWatch logs and metrics, we need to analyze several aspects:
-
Aurora:StorageIOUsage and IOPS: Yes, Aurora:StorageIOUsage is directly related to the IOPS (Input/Output Operations Per Second) you're seeing in the graphs. The cost spikes on October 8th, October 14th, and November 7th correspond to the increased ReadIOPS observed in the graphs for the reader and writer instances. The StorageIOUsage metric represents the total amount of I/O operations, which includes both read and write operations.
-
Change in ReadIOPS pattern: The sudden change from a short spike on November 6th to extended high values on November 7th could be due to several factors:
- A long-running query or batch job
- Data replication or backup processes
- Increased application load or user activity
- Index rebuilding or database maintenance tasks
To pinpoint the exact cause, you'd need to correlate this with application logs and database query logs.
-
Autovacuum and increased ReadIOPS costs: Autovacuum can indeed contribute to increased ReadIOPS and, consequently, higher costs. The CloudWatch Logs query you've shown is looking for operations that took more than 1000 seconds (about 16.7 minutes). If these long-running operations are autovacuum processes, they could be causing significant I/O activity, leading to increased costs.
-
Additional ways to investigate cost spikes:
- Analyze query performance using Performance Insights
- Check for long-running queries in PostgreSQL logs
- Monitor CPU utilization and memory usage
- Examine network throughput metrics
- Look at the number of client connections over time
- Analyze I/O patterns for read, write, and metadata operations
- Performance Insights and cost spike: The extra "IO:DataFileRead" activity shown in blue in the Performance Insights graph is likely related to the Aurora:StorageIOUsage and the cost spike. This metric indicates increased read operations from data files, which would contribute to higher I/O usage and, consequently, higher costs.
To further investigate and potentially mitigate these issues:
- Review and optimize your database queries
- Consider implementing connection pooling to reduce connection churn
- Analyze and adjust your autovacuum settings if necessary
- Monitor and tune your instance's resources (CPU, memory, network)
- Consider scaling your instance size if the workload consistently exceeds current capacity
Remember to establish performance baselines and regularly monitor your Aurora PostgreSQL cluster to identify anomalies and address issues proactively.
Sources
Analyzing PostgreSQL logs using CloudWatch Logs Insights - Amazon Aurora
Using Amazon CloudWatch metrics to analyze resource usage for Aurora PostgreSQL - Amazon Aurora
Managing Aurora PostgreSQL connection churn with pooling - Amazon Aurora
Monitoring metrics in an Amazon Aurora cluster - Amazon Aurora
For a production environment, beyond Cost Explorer, I recommend deploying ALL of the Cloud Intelligence Dashboards Framework.
I suggest deploying all (3) foundational dashboards (CUDOS, CID, and KPI) but also add the Compute Optimizer Dashboard, the Trusted Advisor Organizational Dashboard as well as the Cost Anomaly Dashboard. I think these are low-cost, low-effort, and significant value.
I think it's also worth the (minimal) effort to also deploy the remaining advanced dashboards, at least the: Graviton Savings Dashboard and the Extended Support - Cost Projection. I also suggest deploying the Health Events Dashboard. It provides useful information when reviewing and correlating a service event and documenting a RCA.
Thoughts? We all benefit from feedback! Tyia!
Relevant content
- asked 7 months ago
- asked 5 months ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated a year ago