Instance reachability check failed - two days in a row

0

The last two nights at exactly 1.00 Central European Time we have lost connectivity to an EC2 instance that has ran without problems for years (i-fee7a9b0).

That exact instance was a few days ago stop/started due to a system event where it was moved to new underlying host (Ref: AWS_EC2_INSTANCE_REBOOT_FLEXIBLE_MAINTENANCE_SCHEDULED_b0a31e5f-d9bc-4954-b36c-122e4638f85f )

We have tried whats is described here: https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/screenshot-service.html however we cannot get at screenshot of the instance when in this state. Furthermore nothing is available in the system log.

The CPU usuage (according to cloudwatch) is not high at the point where the reachability check suddenly fails.

If we stop the instance and start it again it comes back, its unresponsive to reboot too.

What to do from here, we are running a production system and this can't be a recurring event.

Edited by: jta on Oct 16, 2019 10:50 PM

Edited by: jta on Oct 16, 2019 10:51 PM

jta
posta 5 anni fa257 visualizzazioni
2 Risposte
0

An update: Has found out that the reachability check failing is correlated to allmost full network utilization on the instance.

jta
con risposta 5 anni fa
0

Turns out that after our instance migrated to new underlying host the network driver used was not compatible. We upgradet to newest AWS Network driver and our problems were resolved.

jta
con risposta 5 anni fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande