CloudWatch memory usage alert triggers, but metrics show no corresponding event

0

We set up an alert to trigger when our ECS containers breach 95% of memory usage. On one instance this alert triggers now multiple times a day, even though the metrics show a pretty stable utilization around 73%. No spikes and no missing data.

Here is the data from today (April 24, 2024) with alerts around 17:30, but no corresponding spike in the above metric: Alarm overview 2024-04-24

And here the view in the metrics, showing all relevant items (memory reserved, memory utilized and the percentage calculation), but no events around 17:30: Metrics overview 2024-04-24

We need guidance on what is going on here. This alert is currently absolutely useless as it triggers without an actual problem.

Thanks

已提問 25 天前檢視次數 131 次
2 個答案
6

I would like to suggest some changes to resolve this issue: -

1.Verify Alert Configuration: Check that the alert threshold is correctly set at 95% and targets the right ECS containers.

2.Confirm Metric Accuracy: Double-check the metric data in CloudWatch Metrics to ensure it accurately reflects memory usage.

3.Review ECS Setup: Check ECS container configurations and investigate for any memory-intensive tasks or issues.

4.Monitor Alarm State Changes: Look into CloudWatch Alarm History for patterns in alarm triggers.

5.Adjust Alert Actions: Review and adjust alert actions if necessary, ensuring they are appropriate.

go through with documents: - https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#common-features-of-alarms

profile picture
已回答 25 天前
  • I double and triple-checked the whole setup and all looks correct. We have the same setup for a list of other ECS instances and we have this problem only for this ECS. It might have been temporary, because it hasn't happen since.

1
已接受的答案

Hello.

What is the setting for CloudWatch Alarm's treat missing data?
Depending on the contents of this setting, an alarm may occur even if the metrics appear normal.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#alarms-and-missing-data

You may also be able to check the reason for the alarm by looking at the CloudWatch Alarm history.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#common-features-of-alarms

profile picture
專家
已回答 25 天前
  • It is set to treat missing data as missing, but we are evaluating percentiles with low samples. So I will take a look at that setting

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南