CloudWatch CPU credit balance monitoring breaks off

0

Hello.

This is a translated text. I'm sorry.

from about a month ago CloudWatch CPU credit balance monitoring breaks off.

Why?

I'm having a hard time because I'm using this number to set an alert event.

Thanks for help.

Enter image description here

Enter image description here

asked 2 years ago273 views
2 Answers
1

Thank you for sharing this information.

I have created Amazon Windows Server 2019 Base on t2.micro in my environment to check and did not experience the same issue.

I cannot investigate your environment, so this is just reference information, but based on the following official AWS documentation, I will share how to investigate.[1]

[1] Using Amazon CloudWatch alarms - Amazon CloudWatch https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#alarms-and-missing-data
----- Excerpt -----
Sometimes, not every expected data point for a metric gets reported to CloudWatch. For example, this can happen when a connection is lost, a server goes down, or when a metric reports data only intermittently by design.
----- Excerpt -----

According to the above documentation, it is possible for metrics to be missing.

An example is server down.

If you are investigating whether or not there is a problem with an EC2 instance, the following documentation may be helpful [2]

[2] Status checks for your instances - Amazon Elastic Compute Cloud
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-system-instance-status-check.html

From the above document, StatusCheckFailed_System indicates a failure on the AWS infrastructure side, while StatusCheckFailed_Instance indicates a problem with the OS or app inside the EC2 instance.

Therefore, please check the status check metrics for the EC2 instance in your environment and see if there are any status anomalies during the time when the CPU credit balance is interrupted.

If there is a status anomaly, it may be related to the interrupted metrics.

If there are no anomalies in the metrics, check the OS and app logs inside the EC2 instance for communication errors or unexpected restarts.

Perhaps there are temporary communication errors or other effects.

The following is a document in Japanese, but it states that it is difficult to completely prevent failures from occurring in AWS, so temporary communication errors are likely to occur. [3]

[3] Guidelines for Technical Inquiries | AWS Support
https://aws.amazon.com/jp/premiumsupport/tech-support-guidelines/#others2
----- Excerpt -----
Failures are unpredictable and inevitable, and while AWS strives to analyze the causes and reduce the incidence of infrastructure failures, it is difficult to completely prevent them from occurring.
----- Excerpt -----

If you want to get to the root of this problem, sign up for an AWS support plan with a Developer
or higher with an AWS support plan and contact technical support. [4]

[4] AWS Support Plan Comparison | Developer, Business, Enterprise, Enterprise On-Ramp | AWS Support
https://aws.amazon.com/premiumsupport/plans/?nc1=h_ls

Please forgive me for not being able to solve the root of the problem with my help.
I hope the above information will be of some help.

profile picture
mn87
answered 2 years ago
0

If you answer the following information to the best of your ability, we may be able to investigate.

・Does the same thing happen with any EC2 instance?
・Which instance type does it occur?
・Was there no interruption until one month ago?
・Did you make any changes during the month?
・Is it still the same even if you change the display period of CloudWatch to something other than 12 hours?
・Am I correct in assuming that other metrics are uninterrupted?
・Are there steps you can take to reproduce the event?

I created an Amazon Linux2 EC2 instance on t2.micro to test it, but the same event did not occur.

profile picture
mn87
answered 2 years ago
  • Q. Does the same thing happen with any EC2 instance? A. Happens only on this EC2 instance.

    Q. Which instance type does it occur? A. Amazon Windows Server 2019 Base on t2.micro

    Q. Was there no interruption until one month ago? A. could not be verified.

    Q. Did you make any changes during the month? A. Added alerts with cpu credit balance numbers. Added an image to the first question. please refer.

    Q. Is it still the same even if you change the display period of CloudWatch to something other than 12 hours? A. Is the same.

    Q. Am I correct in assuming that other metrics are uninterrupted? A. CPU credit usage was interrupted. CPU usage was uninterrupted.

    Q. Are there steps you can take to reproduce the event? A. There is none.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions