Alarm not getting triggered even if the metric crosses threshold

0

Hi,

I have configured a cloudwatch alarm based on an expression which involves 2 metrics.

Expression = m2 - m1, where,

m2: AWS/AutoScaling - GroupInServiceInstances

m1: ECS/ContainerInsights - TaskCount

Alarm Config : Expression <= 5 for 1 datapoints within 1 minute

Missing data case is considered as threshold breach.

The problem is that the alarm is not getting triggered as per the expectation. After the expression breaches the threshold(5), as per the alarm config mentioned above, it should trigger after a minute(1 datapoints within 1 minute) but the alarm gets triggered after a certain(inconsistent) amount of time which is causing the actions(autoscaling) associated to the alarm to be delayed. The delay ranges from 2 - 15 minutes.

Please refer to this link for a screenshot. The blue line in the first graph denotes the expression value and red line the threshold. As can be seen, the expression crosses the threshold at 7:23 but the alarm gets triggered at 7:40. The In Alarm state(red bar) in the second chart is triggered after 17 minutes then it should have.

Any help is really appreciated.

질문됨 2년 전952회 조회
2개 답변
1

From the screenshot actually it looks like alarm is breached when your expression is greater than threshold, not lower.

답변함 2년 전
1

Hi Dhruv,

Given the Alarm configuration and the screenshot that you have provided, they are not quite aligning with each other.

What I can suggest is to look at the Alarm History, and check that particular StateUpdate happened at 7:40 as you mentioned to understand the reason of the triggering which could enlighten us why the alarm triggered. From the History of StateUpdate look for section starting with below for example:

...
"newState": {
      "stateValue": "ALARM",
      "stateReason": "Threshold Crossed: 1 out of the last 1 datapoints [42.7118644067809 (21/01/22 13:00:00)] was greater than or equal to the threshold (40.0) (minimum 1 datapoint for OK -> ALARM transition).",
...

This section will give you explanation on how the Alarm got triggered, and by what reason. The reason data also can provide you confirmation of the Alarm configuration of the threshold and the comparison operator.

Further down information within stateReasonData like recentDatapoints, threshold, and evaluatedDatapoints sections will provide further details into the StateUpdate.

Hope this helps to further troubleshoot your Alarm configuration and the state updates regarding ALARM state.

AWS
지원 엔지니어
답변함 2년 전
  • Hi Munkhbat_T,

    Thanks for the detailed response.

    I am adding more details below from the history section. This is for a different timeframe(24/01/2022) as compared to the 1 mentioned in the original question.

    Here is the screenshot for alarm and metric value: https://ibb.co/s6RjBTH. As can be seen here, the alarm triggered at 4:12 when it was supposed to trigger at 3:49.

    Here is a screenshot from the history section: https://ibb.co/6gyjkgR for the same duration(bottom rows).

    Here is a screenshot of the state change data for the alarm triggering at 4:12 : https://ibb.co/7vLgPQR .

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠