ec2 crash/oom frequently

0

Enter image description here

Enter image description here

awser
已提问 1 年前503 查看次数
1 回答
0

oom-killer is a Linux process that the kernel runs when a system is low on memory, and needs to kill a process to try and free up some memory (it's more complicated, but that's the basic gist of it).

Your graphs show that CPU maxed out for an hour before going back to zero, and then when it went back to zero your failed instance count went from zero to one. There isn't anything noteworthy in terms of the graphs of network and disk.

Putting both of these together, it is likely that your system was running low on memory, and so the Linux memory manager was trying to swap processes out of main memory and onto disk. As free memory gets less and less the memory manager will spend more and more of its time (and more and more CPU) trying to free up pages of main memory, driving the CPU usage up to 100% as you can see in the first graph. Running out of memory is also why oom-killer would be run (it's only ever run in extreme circumstances like this).

Unfortunately the EC2 section of the AWS Console doesn't display metrics for memory use, you'll need to setup CloudWatch agent to collect these https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html This will help with your troubleshooting if this situation happens again.

profile picture
专家
Steve_M
已回答 1 年前
profile picture
专家
已审核 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则