CloudWatch memory usage alert triggers, but metrics show no corresponding event

0

We set up an alert to trigger when our ECS containers breach 95% of memory usage. On one instance this alert triggers now multiple times a day, even though the metrics show a pretty stable utilization around 73%. No spikes and no missing data.

Here is the data from today (April 24, 2024) with alerts around 17:30, but no corresponding spike in the above metric: Alarm overview 2024-04-24

And here the view in the metrics, showing all relevant items (memory reserved, memory utilized and the percentage calculation), but no events around 17:30: Metrics overview 2024-04-24

We need guidance on what is going on here. This alert is currently absolutely useless as it triggers without an actual problem.

Thanks

asked 14 days ago123 views
2 Answers
6

I would like to suggest some changes to resolve this issue: -

1.Verify Alert Configuration: Check that the alert threshold is correctly set at 95% and targets the right ECS containers.

2.Confirm Metric Accuracy: Double-check the metric data in CloudWatch Metrics to ensure it accurately reflects memory usage.

3.Review ECS Setup: Check ECS container configurations and investigate for any memory-intensive tasks or issues.

4.Monitor Alarm State Changes: Look into CloudWatch Alarm History for patterns in alarm triggers.

5.Adjust Alert Actions: Review and adjust alert actions if necessary, ensuring they are appropriate.

go through with documents: - https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#common-features-of-alarms

profile picture
answered 14 days ago
  • I double and triple-checked the whole setup and all looks correct. We have the same setup for a list of other ECS instances and we have this problem only for this ECS. It might have been temporary, because it hasn't happen since.

1
Accepted Answer

Hello.

What is the setting for CloudWatch Alarm's treat missing data?
Depending on the contents of this setting, an alarm may occur even if the metrics appear normal.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#alarms-and-missing-data

You may also be able to check the reason for the alarm by looking at the CloudWatch Alarm history.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#common-features-of-alarms

profile picture
EXPERT
answered 14 days ago
  • It is set to treat missing data as missing, but we are evaluating percentiles with low samples. So I will take a look at that setting

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions