ec2 crash/oom frequently

0

Enter image description here

Enter image description here

已提問 1 年前檢視次數 841 次
1 個回答
0

oom-killer is a Linux process that the kernel runs when a system is low on memory, and needs to kill a process to try and free up some memory (it's more complicated, but that's the basic gist of it).

Your graphs show that CPU maxed out for an hour before going back to zero, and then when it went back to zero your failed instance count went from zero to one. There isn't anything noteworthy in terms of the graphs of network and disk.

Putting both of these together, it is likely that your system was running low on memory, and so the Linux memory manager was trying to swap processes out of main memory and onto disk. As free memory gets less and less the memory manager will spend more and more of its time (and more and more CPU) trying to free up pages of main memory, driving the CPU usage up to 100% as you can see in the first graph. Running out of memory is also why oom-killer would be run (it's only ever run in extreme circumstances like this).

Unfortunately the EC2 section of the AWS Console doesn't display metrics for memory use, you'll need to setup CloudWatch agent to collect these https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html This will help with your troubleshooting if this situation happens again.

profile picture
專家
已回答 1 年前
profile picture
專家
已審閱 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南