- Newest
- Most votes
- Most comments
I understand this is a critical situation for your production environment. Based on the symptoms you're describing - the instance showing as running but being inaccessible, internal errors when trying to describe the volume, and SSH timeouts - it appears your EBS volume may be impaired due to the infrastructure event.
Here are some steps you can attempt:
Immediate troubleshooting:
-
Try stopping and then starting the instance again (not rebooting). This can sometimes resolve issues with instances that have EBS root volumes and may help re-establish connectivity.
-
If you're able to access the volume at all, you could try enabling I/O on the volume and running a consistency check (fsck for Linux or chkdsk for Windows).
If the volume remains inaccessible: If the volume has been impaired for more than 20 minutes and you cannot access it through normal means, you should contact AWS Support. Even with Basic support, you may be able to reach out through the AWS Support Center by choosing "Troubleshoot" and then "Contact Support" to submit a case, especially given this is related to a known infrastructure event.
Important considerations:
- Do you have any recent snapshots of this volume? If so, you could potentially restore your data by creating a new volume from the snapshot and attaching it to a new instance.
- For future protection, regular snapshots of critical production volumes are essential for disaster recovery scenarios like this.
Given that this is related to a specific infrastructure event in ME-CENTRAL-1, AWS Support would be best positioned to provide guidance on any recovery processes that may be available for affected resources. I strongly recommend reaching out to them directly about this situation.
Sources
Work with an impaired Amazon EBS volume - Amazon EBS
Root volumes for your Amazon EC2 instances - Amazon Elastic Compute Cloud
