By using AWS re:Post, you agree to the AWS re:Post Terms of Use

AWS Instance Unreachable and Incompetent Support - Instance ID: *******************

0

Hello AWS Community,

I’m posting to highlight a major issue we’ve been facing with AWS support and our instance (ID: *******************), which has been down and unreachable for several days now, causing significant disruption to our business.

Here’s the timeline of events:

  1. Initial Issue: Our company’s website, which is hosted on the instance *******************, suddenly went down. The instance became completely unreachable, and despite multiple attempts from our end to boot it up, we were unsuccessful.

  2. AWS Support Case: We opened a support case (Case ID: ***************) with AWS and received assistance from their support team. The support agent, Abhiram, worked on the case for approximately 5 hours, attempting to attach and detach storage drives and boot up a new instance, but failed to resolve the issue.

  3. Blame Shifting: After several failed attempts, the agent started claiming that the issue was due to a corrupted OS, and that it was our responsibility to fix it. This was never confirmed, nor was any concrete resolution provided.

  4. Second Attempt with New Agent: The next day, we were assigned a new support agent, OM, who again attempted to resolve the issue for 3 hours with no success. Once again, after failing to boot the instance, the agent started shifting blame, claiming the issue was on our side.

  5. Unprofessional Handling: After this, I made it clear that our primary concern was getting the instance up and running – the instance should be reachable by the customer, which is AWS’s basic responsibility. I repeatedly asked for support in booting the instance, but the agents kept leaving the chat with no resolution and no clear communication.

  6. Support Inconsistency: Both agents were often unavailable for 20-30 minutes at a time, claiming they were "reviewing the case" or "discussing with internal support," but no real progress was made. This behavior is highly unprofessional and left us without a functioning instance for an extended period of time.

To date, the instance is still not reachable, and no one at AWS has provided any concrete solution or support. We are deeply frustrated and feel that AWS has failed in its basic responsibility to ensure that our instance is accessible.

This issue has caused us significant business losses, and the level of support we’ve received is unacceptable. The lack of communication, unprofessionalism, and inability to resolve a basic instance boot-up issue has been shocking.

I’m sharing this here to highlight the incompetence and poor customer service we’ve experienced, hoping that AWS will take notice of this case and improve its support. This situation reflects a critical bug within their data center systems that prevents our instance from booting up, and at this point, we’re left with no viable options.

I would appreciate any insights or advice from the community, as well as any suggestions on how to escalate this further within AWS to ensure that this issue is addressed.

*Edit: Removed Case ID and ARN — Zain P.

4 Answers
0

I apologize for the frustrating experience you've had with your unreachable EC2 instance and the support you've received. This situation is certainly not typical of AWS support, and I understand how disruptive it can be to your business. Let me provide some insights and potential next steps to help resolve your issue.

First, it's important to understand that an unreachable instance can be caused by various factors, including operating system failures, network issues, or hardware problems. While it's true that some of these issues may be on the customer side, AWS is responsible for ensuring the underlying infrastructure is functioning correctly.

Given that multiple attempts to boot the instance have failed, and considering the length of time the instance has been unreachable, here are some steps you might consider:

  1. Escalation: If you haven't already, escalate your case within AWS support. Request to speak with a senior support engineer or a manager. Given the impact on your business, this case should receive priority attention.

  2. Create an AMI: If possible, try to create an Amazon Machine Image (AMI) from the unreachable instance. This can sometimes be done even when an instance is unresponsive. If successful, you could launch a new instance from this AMI, potentially recovering your data and environment.

  3. Volume Recovery: If creating an AMI isn't possible, you might be able to detach the root volume from the unreachable instance, attach it to a new instance as a secondary volume, and attempt to recover your data that way.

  4. New Instance: As a last resort, if data recovery isn't critical, consider launching a new instance and rebuilding your environment. While not ideal, this might be faster than waiting for a resolution if the current instance remains unrecoverable.

  5. Business Impact: Document the business impact this outage has had on your operations. This information can be useful when discussing the issue with AWS support and potentially when seeking any form of compensation or credit.

  6. Review Your Architecture: Once this immediate issue is resolved, it might be worth reviewing your architecture to implement more resilient designs, such as using multiple Availability Zones or implementing auto-scaling groups to prevent single points of failure.

Regarding the support experience, it's disappointing to hear about the lack of consistent communication and the perceived blame-shifting. AWS typically prides itself on high-quality support, especially for business and enterprise customers. Your feedback about this experience is valuable and should be shared with AWS, as it helps them improve their support processes.

Remember, as an AWS customer, you have the right to expect professional, timely, and effective support. Don't hesitate to be persistent in seeking a resolution and escalating the issue if necessary. If you have an AWS account manager, reaching out to them directly about this situation could also be beneficial.

I hope this guidance helps you move towards a resolution. Keep pushing for answers and assistance until your instance is recovered or you have a clear path forward for your business operations.
Sources
Troubleshoot status check failures on an unreachable EC2 Linux instance | AWS re:Post
Troubleshoot an unreachable Amazon EC2 instance - Amazon Elastic Compute Cloud
Troubleshoot Amazon EC2 instance stop issues - Amazon Elastic Compute Cloud

profile picture
answered a month ago
AWS
MODERATOR
reviewed a month ago
0

Assuming that you didn't make any changes to the instance configuration, such as switching between instance families (like from an "m4" to an "m6i" family), it's almost certain that your operating system installation or its file system are, indeed, corrupted or damaged.

When the EC2 instance is stopped (powered off) entirely and started again, it effectively gets started on a different hardware platform, so anything that might have been wrong with the EC2 instance itself will not get carried over to the new platform where it's started. The only things relevant to the boot-up process that do get carried over are the configuration (like the instance type and network settings) and the contents of your EBS volumes.

I suggest you first use the screenshot option and the system log options in the EC2 console's monitoring menu to check if any errors are reported during the boot-up process. Particularly errors related to mounting the file system could indicate the source of the problem directly. However, if the normal logon prompt is reached without reporting any errors, that would suggest a network problem instead. I'm assuming that the startup process isn't completing.

If that assumption is correct, your least-bad option will probably be to launch a new EC2 instance using the same or newer operating system version that the old instance was using. Once the new server is running, detach the EBS volume from the old instance and attach it to the new one. Mount the volume's file systems on the new instance, and if they get mounted correctly, recover any data and programs you need.

If there's a simple file system corruption problem, which I'd think reasonably unlikely, you could also try to repair the damage, but this can risk breaking it even further. If you do decide to try this unlikely route to recovery, I suggest you make an EBS snapshot of the old disk before attempting to repair it.

EXPERT
answered a month ago
0

Thanks for reaching out.

We understand the urgency of this matter and have reached out internally.

Please keep an eye out on your support case for any updates.

— Zain P.

AWS
MODERATOR
answered a month ago
-1

Thank you for your response and for sharing your insights. While I appreciate your intention to help, I want to clarify some key points from our experience and provide some further context about the situation we’re facing.

First of all, I do understand that an unreachable instance can result from various factors, including OS failures, network issues, or even underlying hardware problems. However, I’d like to emphasize that, as an AWS customer, we rely on AWS's infrastructure to ensure the underlying systems are operational. The failure to boot the instance for over 5 days is indicative of an issue within the AWS infrastructure or management systems—not something that should fall on the customer to resolve.

Regarding the troubleshooting steps you've suggested:

Escalation: We've already escalated the case multiple times, both with agent Abhiram and agent OM, and no substantial progress has been made. In fact, the support agents have been unreachable at times, with no clear communication or resolution being provided. It's concerning that escalation doesn't seem to be yielding results.

Create an AMI: We’ve already tried to create an AMI, but due to the instance being unresponsive, this hasn't been possible. It's worth noting that we did suggest recovery options from AWS’s side, but the agents failed to provide any real guidance or effective solutions.

Volume Recovery: Detaching the root volume and attempting to recover data from a secondary instance is something we’ve tried. However, even after taking this step, there has been no resolution from AWS's side on why the instance isn’t booting, and the attempts to assist us have been minimal at best.

New Instance: This is the least desirable option for us. It’s not about starting over; it’s about maintaining the integrity of our current instance. The problem isn’t about data recovery but about the failure of AWS to provide support in booting the instance, which should be a basic expectation for a cloud service provider. Rebuilding an entire environment isn't an acceptable solution when the problem clearly lies with AWS’s systems or management.

Business Impact: Our business has already lost several days of service, which has severely impacted our revenue and customer trust. We understand that AWS can't guarantee 100% uptime, but it’s AWS's responsibility to ensure that their infrastructure and support processes are capable of addressing such issues quickly and effectively. This is where we've been failed, and we expect more from a company of AWS's scale.

Reviewing Architecture: We absolutely understand the importance of resilient architecture. However, this instance issue isn’t a result of our architecture; it's a direct issue with the AWS infrastructure or service management that’s preventing our instance from being booted and made accessible. This issue has nothing to do with our architecture, but rather a fundamental problem with AWS’s ability to provide the promised services.

The lack of consistent, effective communication, and the dismissive attitude of support agents, especially when they start shifting blame to the customer, is absolutely unacceptable. We are not merely looking for technical advice—we are looking for AWS to take accountability and ensure that our instance is booted and reachable. The level of support we’ve received so far has been unprofessional and inadequate.

We’re posting this to share our experience in the hope that AWS can better address the shortcomings in their service and support. I strongly urge AWS to take this issue seriously and offer us a resolution. We expect AWS to meet its basic responsibility to provide customers with accessible, functioning instances, as outlined in their service agreement.

answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions