ec2 crash/oom frequently

0

Enter image description here

Enter image description here

awser
demandé il y a un an504 vues
1 réponse
0

oom-killer is a Linux process that the kernel runs when a system is low on memory, and needs to kill a process to try and free up some memory (it's more complicated, but that's the basic gist of it).

Your graphs show that CPU maxed out for an hour before going back to zero, and then when it went back to zero your failed instance count went from zero to one. There isn't anything noteworthy in terms of the graphs of network and disk.

Putting both of these together, it is likely that your system was running low on memory, and so the Linux memory manager was trying to swap processes out of main memory and onto disk. As free memory gets less and less the memory manager will spend more and more of its time (and more and more CPU) trying to free up pages of main memory, driving the CPU usage up to 100% as you can see in the first graph. Running out of memory is also why oom-killer would be run (it's only ever run in extreme circumstances like this).

Unfortunately the EC2 section of the AWS Console doesn't display metrics for memory use, you'll need to setup CloudWatch agent to collect these https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html This will help with your troubleshooting if this situation happens again.

profile picture
EXPERT
Steve_M
répondu il y a un an
profile picture
EXPERT
vérifié il y a un an

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions