Collecting memory dump of EC2 to mitigate risk of crashing because of underlying host issues

0

I am looking for risk mitigation strategy for a critical Application running on EC2. If EC2 crashes because of underlying host issue, how can we take memory dump for diagnostic reasons and for submitting it with support case with application provider?

Do we have any way of proactively taking memory dumps, and collect it in cloudwatch or s3?

AWS
Pir
已提问 2 个月前73 查看次数
2 回答
0

Hello.

Is there insufficient information if I just get the memory usage rate with CloudWatch Agent?
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/metrics-collected-by-CloudWatch-agent.html

Although it may not be very useful in the event of an EC2 physical host failure, you can also use kdump to record information useful for troubleshooting in the event of a kernel panic.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/diagnostic-interrupt.html

profile picture
专家
已回答 2 个月前
0

Hello,

If your needs related to resource utilization monitoring you can consider monitoring tools like atop and sar.

Additionally, you can also push more Linux metrics using the CloudWatch agent and push them into CloudWatch.

[+] https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/metrics-collected-by-CloudWatch-agent.html

In case of an underlying hardware issue, EC2 will reach 2/2 check failure, or the instance will be moved to another hardware. In this case, what diagnostics you want to share with the support engineer?

profile picture
已回答 2 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则