EC2 instance 100% CPU due to system interrupts

0

We have a t2.micro instance (Windows) that every so often goes to 100% CPU and stays there indefinitely, resulting in massively slow response time. The Windows performance monitor shows all available CPU resource is consumed by system interrupts. The only solution seems to be to stop the instance and restart it. Then it runs fine for another week or two, then this happens again.

I read a few accounts in the forum (old) that when a single CPU instance lands on vCPU 0 that it handles all the IO interrupts for all other VMs on that host. I don't know if that is still true, but how can I avoid this? It is not acceptable to have to restart the instance - we don't find out about this until it is already happening and our customers are getting poor service. It is not OK that the resources we PAY for are being used to service other users of the hardware.

The t2.micro performs just fine otherwise so I see no need to upgrade to a larger instance (the next level would be more than 2X the cost).

Any idea if this in in fact the issue, or how to avoid it?

已提問 2 年前檢視次數 2208 次
1 個回答
2

Hello! The t2.micro instance type has just 1 GB of ram https://aws.amazon.com/ec2/instance-types/t2/. Depending on your workload, it might not have enough resources for what you need.

Additionally, t2 offers "burstable" CPU performance. This means that by default, you get a baseline level of performance according to the size, and it bursts when needed until the CPU bucket credits expire. Then it will burst again when needed when the CPU bucket credits refill. For more information about the CPU burst model, please check https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-credits-baseline-concepts.html

In your example, your baseline performance is 10% per VCPU (as explained in the above link) and you have 1 VCPU. For more insights about your particular case, I recommend checking the CloudWatch Metrics listed here https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/viewing_metrics_with_cloudwatch.html . I would start by checking the CPUUtilization as well as the CPUCreditBalance metric. If the latter one is close to zero, it means the instance is requiring additional CPU more than the time the burst credit allows to. If this happens, I have a couple of suggestions:

  1. Increase the instance size, so you can have a larger baseline performance and amount of CPU credits.
  2. Turn on "unlimited mode", which means the instance will be allowed to burst 100% of the time. For more information about unlimited mode, please check https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances-unlimited-mode.html

If you need more assistance, I encourage you to create a new support case and then an engineer will have access to see your instance metrics and provide additional suggestions.

AWS
支援工程師
已回答 2 年前
profile pictureAWS
專家
已審閱 2 年前
  • If CPU credits are a factor then consider switching from a t2 to a t3 as by default t3 launch as unlimited.

  • RoB correct, although some customers prefer to have unlimited mode as disabled by default to prevent any unforeseen billing charges.

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南