Instance reachability check failed - two days in a row

0

The last two nights at exactly 1.00 Central European Time we have lost connectivity to an EC2 instance that has ran without problems for years (i-fee7a9b0).

That exact instance was a few days ago stop/started due to a system event where it was moved to new underlying host (Ref: AWS_EC2_INSTANCE_REBOOT_FLEXIBLE_MAINTENANCE_SCHEDULED_b0a31e5f-d9bc-4954-b36c-122e4638f85f )

We have tried whats is described here: https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/screenshot-service.html however we cannot get at screenshot of the instance when in this state. Furthermore nothing is available in the system log.

The CPU usuage (according to cloudwatch) is not high at the point where the reachability check suddenly fails.

If we stop the instance and start it again it comes back, its unresponsive to reboot too.

What to do from here, we are running a production system and this can't be a recurring event.

Edited by: jta on Oct 16, 2019 10:50 PM

Edited by: jta on Oct 16, 2019 10:51 PM

jta
已提问 5 年前257 查看次数
2 回答
0

An update: Has found out that the reachability check failing is correlated to allmost full network utilization on the instance.

jta
已回答 5 年前
0

Turns out that after our instance migrated to new underlying host the network driver used was not compatible. We upgradet to newest AWS Network driver and our problems were resolved.

jta
已回答 5 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则