EC2 instance in US West Oregon intermittently losing network connectivity?

0

My EC2 instance in the US West Oregon region appears to be losing its network connection intermittently for the last 12 hours. The AWS console indicates that the server is running, but it has failed status checks a dozen times, & I can see gaps in the CloudWatch monitoring graphs as well as in my CloudWatch dashboard. When I try to SSH into the instance, I sometimes get a connection timeout error, & the Web site being hosted on the instance is sometimes unreachable. CPU & memory usage on the instance is low. I've tried rebooting the instance, & I don't see any errors in the instance's system log. Wondering if this could be a network infrastructure issue?

tpshek
質問済み 5年前537ビュー
2回答
0

You can check the status from here : https://status.aws.amazon.com/
Or you can talk with Customer support regarding this issue.

回答済み 5年前
0

Hello,

I am sorry to hear about the issue with your instance. I have checked your AWS account and I could see that there is currently one instance running in us-west-2. I assume that this is the instance in question, please correct me if this assumption is wrong.

I have checked the instance and I could see that the underlying physical host, on top of which your instance is hosted, had been experiencing hardware related issues on 2019-01-30. At the moment the status for this physical host is 'healthy'.

Please note that in the future you can check whether an instance was affected by a hardware related event by checking its 'System Status Checks' [1]. The history of these checks can also be viewed in Amazon CloudWatch by looking at StatusCheckFailed_System metric \[2,3].

Please accept our apologies for the above issue and for any inconvenience caused by it.

I would like to suggest that you to take a look at the Auto Recovery feature for Amazon EC2. You can create an Amazon CloudWatch alarm that monitors an Amazon EC2 instance and automatically recovers the instance if it becomes impaired due to an underlying hardware failure or a problem that requires AWS involvement to repair. Basically, you can use CloudWatch to set up the alarm which will trigger when the System Status check fails. This alarm can further trigger an EC2 Action like "Recover this instance" \[4,5].

We also advise to our customers to design their application in such a way such that there is no single point of failure in their environment. Please refer to our white paper on Building Fault-Tolerant Applications in the AWS Cloud \[6] for more information.

Please let us know if you need any further help.

Links:
[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-system-instance-status-check.html#types-of-instance-status-checks
[2] https://aws.amazon.com/blogs/aws/ec2-instance-status-metrics/
[3] https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ec2-metricscollected.html#ec2-metrics
[4] http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/UsingAlarmActions.html#AddingRecoverActions
[5] https://aws.amazon.com/blogs/aws/new-auto-recovery-for-amazon-ec2/
[6] https://aws.amazon.com/whitepapers/designing-fault-tolerant-applications/

Regards,
awstomas

AWS
回答済み 5年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ