ec2 crash/oom frequently

0

Enter image description here

Enter image description here

awser
posta un anno fa504 visualizzazioni
1 Risposta
0

oom-killer is a Linux process that the kernel runs when a system is low on memory, and needs to kill a process to try and free up some memory (it's more complicated, but that's the basic gist of it).

Your graphs show that CPU maxed out for an hour before going back to zero, and then when it went back to zero your failed instance count went from zero to one. There isn't anything noteworthy in terms of the graphs of network and disk.

Putting both of these together, it is likely that your system was running low on memory, and so the Linux memory manager was trying to swap processes out of main memory and onto disk. As free memory gets less and less the memory manager will spend more and more of its time (and more and more CPU) trying to free up pages of main memory, driving the CPU usage up to 100% as you can see in the first graph. Running out of memory is also why oom-killer would be run (it's only ever run in extreme circumstances like this).

Unfortunately the EC2 section of the AWS Console doesn't display metrics for memory use, you'll need to setup CloudWatch agent to collect these https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html This will help with your troubleshooting if this situation happens again.

profile picture
ESPERTO
Steve_M
con risposta un anno fa
profile picture
ESPERTO
verificato un anno fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande