Skip to content

How can I set up a SMS text message for this alert, "StatusCheckFailed > 0 for 1 datapoints within 5 minutes"

0

Hi re:Post, I am reviewing the AWS CloudWatch monitoring and alerts ecosystem. We get a few of these alerts a month: "StatusCheckFailed > 0 for 1 datapoints within 5 minutes"

  1. What can I read/research/config in order to get an SMS text message for this alert?
  2. What can I read/research understand what actually failed?
  3. What does a 'StatusCheck' check?

Thank you for your time and help! Best Regards, Donald

1 Answer
2
Accepted Answer

To set up an SMS text message alert for the "StatusCheckFailed > 0 for 1 datapoints within 5 minutes" alarm in AWS CloudWatch.

  1. To get the SMS text message set up:

The first thing you'll need to do is create an Amazon SNS (Simple Notification Service) topic that will handle sending the SMS messages. In the AWS Management Console, go to the SNS service and create a new topic. When setting it up, choose "SMS" as the protocol and configure the details like the phone number you want to receive the texts on and the sender name. Once you have the SNS topic created, you can head over to the CloudWatch service and create a new alarm (or edit an existing one) for the "StatusCheckFailed > 0 for 1 datapoints within 5 minutes" metric. In the alarm action section, you'll select the SNS topic you just created. Now, whenever that alarm is triggered because the "StatusCheckFailed" metric goes above 0 for 1 data point in a 5-minute period, you'll get an SMS text message about it.

AWS Documentation: https://docs.aws.amazon.com/sns/latest/dg/sns-mobile-phone-number-as-subscriber.html https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html

  1. To understand what actually failed:

The "StatusCheckFailed" metric in CloudWatch indicates that the health check for one of your AWS resources, like an EC2 instance or a load balancer, has failed. CloudWatch is continuously monitoring the health of your resources and will trigger an alarm if something isn't working correctly. You can dig into the details of the failed check in the CloudWatch console to see exactly which resource had the problem and what the issue was. This will help you investigate and resolve the underlying cause.

AWS Documentation: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-system-instance-status-check.html

  1. And what does a 'StatusCheck' check?

The "StatusCheck" is the health check that AWS performs on your resources. For example, for EC2 instances, it's checking two things - the underlying AWS systems that the instance is running on, and the instance itself to make sure it's functioning properly. For other AWS services like load balancers or databases, the status checks are tailored to monitor the health and availability of those specific resources.

AWS Documentation: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/monitor_overview.html

AWS
answered 2 years ago
EXPERT
reviewed 2 years ago
EXPERT
reviewed 2 years ago
  • Hi @Harshika, Thank you for your quick reply and help! Regarding answer for #2: "You can dig into the details of the failed check in the CloudWatch console to see exactly which resource had the problem and what the issue was. "

    When I look at the Details of the alert, CloudWatch > Alarms > vir.trulab.com EC2 status check failed > Details It does not give much information--> "Name: vir.trulab.com EC2 status check failed Type: Metric alarm Description: No description State: OK Threshold: StatusCheckFailed > 0 for 1 datapoints within 5 minutes Last state update: 2024-05-13 22:17:08 (UTC) " a) For "Threshold" what "datapoints" is it referring to? b) Why is there 'No description' for Description?

    Thanks again! Best Regards, Donald

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.