EC2 instance 100% CPU due to system interrupts

0

We have a t2.micro instance (Windows) that every so often goes to 100% CPU and stays there indefinitely, resulting in massively slow response time. The Windows performance monitor shows all available CPU resource is consumed by system interrupts. The only solution seems to be to stop the instance and restart it. Then it runs fine for another week or two, then this happens again.

I read a few accounts in the forum (old) that when a single CPU instance lands on vCPU 0 that it handles all the IO interrupts for all other VMs on that host. I don't know if that is still true, but how can I avoid this? It is not acceptable to have to restart the instance - we don't find out about this until it is already happening and our customers are getting poor service. It is not OK that the resources we PAY for are being used to service other users of the hardware.

The t2.micro performs just fine otherwise so I see no need to upgrade to a larger instance (the next level would be more than 2X the cost).

Any idea if this in in fact the issue, or how to avoid it?

asked 2 years ago2180 views
1 Answer
2

Hello! The t2.micro instance type has just 1 GB of ram https://aws.amazon.com/ec2/instance-types/t2/. Depending on your workload, it might not have enough resources for what you need.

Additionally, t2 offers "burstable" CPU performance. This means that by default, you get a baseline level of performance according to the size, and it bursts when needed until the CPU bucket credits expire. Then it will burst again when needed when the CPU bucket credits refill. For more information about the CPU burst model, please check https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-credits-baseline-concepts.html

In your example, your baseline performance is 10% per VCPU (as explained in the above link) and you have 1 VCPU. For more insights about your particular case, I recommend checking the CloudWatch Metrics listed here https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/viewing_metrics_with_cloudwatch.html . I would start by checking the CPUUtilization as well as the CPUCreditBalance metric. If the latter one is close to zero, it means the instance is requiring additional CPU more than the time the burst credit allows to. If this happens, I have a couple of suggestions:

  1. Increase the instance size, so you can have a larger baseline performance and amount of CPU credits.
  2. Turn on "unlimited mode", which means the instance will be allowed to burst 100% of the time. For more information about unlimited mode, please check https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances-unlimited-mode.html

If you need more assistance, I encourage you to create a new support case and then an engineer will have access to see your instance metrics and provide additional suggestions.

AWS
SUPPORT ENGINEER
answered 2 years ago
profile pictureAWS
EXPERT
reviewed 2 years ago
  • If CPU credits are a factor then consider switching from a t2 to a t3 as by default t3 launch as unlimited.

  • RoB correct, although some customers prefer to have unlimited mode as disabled by default to prevent any unforeseen billing charges.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions