CloudWatch Alarm configuration

0

SCENARIO: I have a cloudwatch alarm action that triggers an SNS topic. The alarm metric is configured to filter CRITICAL events in a Lambda Log group. The Lambda (invoked every 15 minutes) checks for CloudFormation stacks in 'error' states and logs the critical event for each stack in the error state.

      Logs::MetricFilter
      FilterPattern: '{$.level="CRITICAL"}'
      MetricValue: 1

      CloudWatch::Alarm
      AlarmActions: Send to SNS Topic
      Period: 600
      TreatMissingData: notBreaching
      ComparisonOperator: GreaterThanOrEqualToThreshold
      Threshold: 1
      EvaluationPeriods: 1
      Statistic: Maximum

Cloudwatch alarm works as expected when 1 stack is in the error state:

  • Picks the CRITICAL event
  • ALARM changes state to 'In Alarm'
  • SNS Topic triggered

CHALLENGE: If any other stack goes into error (like 15 minutes later), and the initial stack is still in error, the Alarm doesn't act on it. i.e. trigger the SNS topic. I understand from research that this is normal behavior because " If your metric value is still in breach of your threshold, the alarm will remain in the ALARM state until it no longer breaches the threshold."

I have also tested this and confirmed - I used boto3 to set_alarm_state back to OK, invoked the Lambda manually, the Alarm state was changed back to 'In Alarm', and the SNS topic triggered.

QUESTION: is there any other suitable configuration or logic I can use to trigger the SNS topic for every stack in the error state?

1 回答
1
已接受的回答

You could replace Lambda and CloudWatch with CloudFormation notifications to EventBridge. See: Using CloudFormation events to build custom workflows for post provisioning management.

profile pictureAWS
专家
kentrad
已回答 1 年前
  • This looks like a very viable solution. Thank you.

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则