CloudWatch Alarm configuration

0

SCENARIO: I have a cloudwatch alarm action that triggers an SNS topic. The alarm metric is configured to filter CRITICAL events in a Lambda Log group. The Lambda (invoked every 15 minutes) checks for CloudFormation stacks in 'error' states and logs the critical event for each stack in the error state.

      Logs::MetricFilter
      FilterPattern: '{$.level="CRITICAL"}'
      MetricValue: 1

      CloudWatch::Alarm
      AlarmActions: Send to SNS Topic
      Period: 600
      TreatMissingData: notBreaching
      ComparisonOperator: GreaterThanOrEqualToThreshold
      Threshold: 1
      EvaluationPeriods: 1
      Statistic: Maximum

Cloudwatch alarm works as expected when 1 stack is in the error state:

  • Picks the CRITICAL event
  • ALARM changes state to 'In Alarm'
  • SNS Topic triggered

CHALLENGE: If any other stack goes into error (like 15 minutes later), and the initial stack is still in error, the Alarm doesn't act on it. i.e. trigger the SNS topic. I understand from research that this is normal behavior because " If your metric value is still in breach of your threshold, the alarm will remain in the ALARM state until it no longer breaches the threshold."

I have also tested this and confirmed - I used boto3 to set_alarm_state back to OK, invoked the Lambda manually, the Alarm state was changed back to 'In Alarm', and the SNS topic triggered.

QUESTION: is there any other suitable configuration or logic I can use to trigger the SNS topic for every stack in the error state?

1 個回答
1
已接受的答案

You could replace Lambda and CloudWatch with CloudFormation notifications to EventBridge. See: Using CloudFormation events to build custom workflows for post provisioning management.

profile pictureAWS
專家
kentrad
已回答 1 年前
  • This looks like a very viable solution. Thank you.

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南