Alarm not getting triggered even if the metric crosses threshold

0

Hi,

I have configured a cloudwatch alarm based on an expression which involves 2 metrics.

Expression = m2 - m1, where,

m2: AWS/AutoScaling - GroupInServiceInstances

m1: ECS/ContainerInsights - TaskCount

Alarm Config : Expression <= 5 for 1 datapoints within 1 minute

Missing data case is considered as threshold breach.

The problem is that the alarm is not getting triggered as per the expectation. After the expression breaches the threshold(5), as per the alarm config mentioned above, it should trigger after a minute(1 datapoints within 1 minute) but the alarm gets triggered after a certain(inconsistent) amount of time which is causing the actions(autoscaling) associated to the alarm to be delayed. The delay ranges from 2 - 15 minutes.

Please refer to this link for a screenshot. The blue line in the first graph denotes the expression value and red line the threshold. As can be seen, the expression crosses the threshold at 7:23 but the alarm gets triggered at 7:40. The In Alarm state(red bar) in the second chart is triggered after 17 minutes then it should have.

Any help is really appreciated.

demandé il y a 2 ans865 vues
2 réponses
1

From the screenshot actually it looks like alarm is breached when your expression is greater than threshold, not lower.

répondu il y a 2 ans
1

Hi Dhruv,

Given the Alarm configuration and the screenshot that you have provided, they are not quite aligning with each other.

What I can suggest is to look at the Alarm History, and check that particular StateUpdate happened at 7:40 as you mentioned to understand the reason of the triggering which could enlighten us why the alarm triggered. From the History of StateUpdate look for section starting with below for example:

...
"newState": {
      "stateValue": "ALARM",
      "stateReason": "Threshold Crossed: 1 out of the last 1 datapoints [42.7118644067809 (21/01/22 13:00:00)] was greater than or equal to the threshold (40.0) (minimum 1 datapoint for OK -> ALARM transition).",
...

This section will give you explanation on how the Alarm got triggered, and by what reason. The reason data also can provide you confirmation of the Alarm configuration of the threshold and the comparison operator.

Further down information within stateReasonData like recentDatapoints, threshold, and evaluatedDatapoints sections will provide further details into the StateUpdate.

Hope this helps to further troubleshoot your Alarm configuration and the state updates regarding ALARM state.

AWS
INGÉNIEUR EN ASSISTANCE TECHNIQUE
répondu il y a 2 ans
  • Hi Munkhbat_T,

    Thanks for the detailed response.

    I am adding more details below from the history section. This is for a different timeframe(24/01/2022) as compared to the 1 mentioned in the original question.

    Here is the screenshot for alarm and metric value: https://ibb.co/s6RjBTH. As can be seen here, the alarm triggered at 4:12 when it was supposed to trigger at 3:49.

    Here is a screenshot from the history section: https://ibb.co/6gyjkgR for the same duration(bottom rows).

    Here is a screenshot of the state change data for the alarm triggering at 4:12 : https://ibb.co/7vLgPQR .

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions