Collecting memory dump of EC2 to mitigate risk of crashing because of underlying host issues

0

I am looking for risk mitigation strategy for a critical Application running on EC2. If EC2 crashes because of underlying host issue, how can we take memory dump for diagnostic reasons and for submitting it with support case with application provider?

Do we have any way of proactively taking memory dumps, and collect it in cloudwatch or s3?

AWS
Pir
posta 2 mesi fa81 visualizzazioni
2 Risposte
0

Hello.

Is there insufficient information if I just get the memory usage rate with CloudWatch Agent?
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/metrics-collected-by-CloudWatch-agent.html

Although it may not be very useful in the event of an EC2 physical host failure, you can also use kdump to record information useful for troubleshooting in the event of a kernel panic.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/diagnostic-interrupt.html

profile picture
ESPERTO
con risposta 2 mesi fa
0

Hello,

If your needs related to resource utilization monitoring you can consider monitoring tools like atop and sar.

Additionally, you can also push more Linux metrics using the CloudWatch agent and push them into CloudWatch.

[+] https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/metrics-collected-by-CloudWatch-agent.html

In case of an underlying hardware issue, EC2 will reach 2/2 check failure, or the instance will be moved to another hardware. In this case, what diagnostics you want to share with the support engineer?

profile picture
con risposta 2 mesi fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande