EC2: Most basic Ubuntu server becomes unresponsive in a matter of minutes

0

Hi everyone, I'm at my wit's end on this one. I think this issue has been plaguing me for years. I've used EC2 successfully at different companies, and I know it is at least on some level a reliable service, and yet the most basic offering consistently fails on me almost immediately.

I have taken a video of this, but I'm a little worried about leaking details from the console, and it's about 13 minutes long and mostly just me waiting for the SSH connection to time out. Therefore, I've summarized it in text below, but if anyone thinks the video might be helpful, let me know and I can send it to you. The main reason I wanted the video was to prove to myself that I really didn't do anything "wrong" and that the problem truly happens spontaneously.

The issue

When I spin up an Ubuntu server with every default option (the only thing I put in is the name and key pair), I cannot connect to the internet (e.g. curl google.com fails) and the SSH server becomes unresponsive within a matter of 1-5 minutes.

Steps to reproduce

  • Go onto the EC2 dashboard with no running instances
  • Create a new instance using the "Launch Instances" button
  • Fill in the name and choose a key pair
  • Wait for the server to start up (1-3 minutes)
  • Click the "connect button"
    • Typically I use an ssh client but I wanted to remove all possible sources of failure
  • Type curl google.com
    • curl: (6) Could not resolve host: google.com
    • I later solved this by changing the DNS server using resolvd/systemd to 8.8.8.8. However, I'm frustrated that the default DNS on a default VPC doesn't Just Work.
  • Type watch -n1 date
  • Wait 4 minutes
    • The date stops updating
  • Refresh the page
    • Connection is not possible
  • Reboot instance from the console
  • Connection becomes possible again... for a minute or two
  • Problem persists

Questions I would have if I were helping troubleshoot this

  • What's the instance status?
    • Running
  • What if you wait a while?
    • I can leave it running overnight and it will still fail to connect the next morning
  • Have you tried other AMIs?
    • No, I suppose I haven't, but I'd like to use Ubuntu!
  • Is the VPC/subnet routed to an internet gateway?
    • Yes, 0.0.0.0/0 routes to a newly created internet gateway
  • Does the ACL allow for inbound/outbound connections?
    • Yes, both
  • Does the security group allow for inbound/outbound connections?
    • Yes, both
  • Do the status checks pass?
    • System reachability check passed
    • Instance reachability check passed
  • How does the monitoring look?
    • It's fine/to be expected
    • CPU peaks around 20% during boot up
    • Network Y axis is either in bytes or kilobytes
  • Have you checked the syslog?
    • Yes and I didn't see anything obvious, but I'm happy to try to fetch it and give it out to anyone who thinks it might be useful. Naturally, it's frustrating to try to go through it when your SSH connection dies after 1-5 minutes.

Please feel free to ask me any other troubleshooting questions. I'm simply unable to create a usable EC2 instance at this point!

P.S. I started by asking this question on Reddit and others have recommended various troubleshooting steps. https://www.reddit.com/r/aws/comments/17jvf7u/ec2_most_basic_ubuntu_server_becomes_unresponsive/

  • Currently, the duplicate that most reminds me of this issue is this one: https://repost.aws/questions/QUTwS7cqANQva66REgiaxENA/ec2-instance-rejecting-connections-after-7-minutes#ANcg4r98PFRaOf1aWNdH51Fw

    I literally don't know how to get Amazon's customer service attention. I'm not willing to pay money so that the basic service works at all for me... that's just unreasonable.

  • Doing all the same stuff on us-east-2 works fine. I'm more and more convinced this is an issue on Amazon's end.

  • I was correct about this being an issue on the backend of AWS. Like others (such as the link in my comment above), I reached out to AWS account and billing, and after a couple of days of communication, they managed to restore connectivity to instances on us-east-1. If you encounter this in the future and you don't want to pay $30, I recommend linking this thread and others in a support ticket from AWS's account and billing which is available for free users. They will kindly be flexible and reach out to the technical support team on your behalf.

  • Please note that I spent many hours checking all alternatives before messaging billing. Don't just request help because you didn't check if your instance has a public IP, or trying a brand new VPC, and making sure that the VPC has an internet gateway in its routing table, or that the security group allows incoming connections on port 22.

asked 6 months ago223 views
2 Answers
0

Your reddit post says that the region you're seeing these problems is us-east-1. Is this happening when you stand up an Ubuntu EC2 in all of the pre-created subnets in the default VPC in that region? What if you create your own VPC and a subnet with an internet gateway in us-east-1, do you see the same behaviour?

You mention that you don't get the problem in us-east-2, so it would be worth double-checking the AMI that's being used. Obviously it will be a different AMI ID to that in us-east-1 (because AMIs are region-specific) but to-all-intents-and-purposes is it the same Ubuntu image you're using in both? It couldn't be a private, customised AMI that's giving you problems in us-east-1, but the latest public AMI direct from Ubuntu in us-east-2 (which is fine)?

Do you have anything setup in us-east-1 but not us-east-2 that might be making changes on-the-fly without your knowledge? I'm thinking of tools like AWS Config or GuardDuty that might have been set up years ago in us-east-1 and been lying dormant, until your new instance and its security groups etc. are in scope of its remediation activities?

If it's not that then could it be doing some kind of automatic update after it boots? Say that takes about seven minutes, and then it reboots so you lose the connection, and then it comes back with updates applied which has done something with (say) the host-based firewall. Even as I type this it sounds daft, but then so does what you're describing (I don't mean that you're describing it wrongly, more like what you're describing is very unusual).

Lastly, do you get the same with an Ubuntu-like AMI such as SUSE, and what about other Linux flavours such as Amazon Linux or RHEL/CentOS ?

If you are still no further forwards after all of this, consider installing CloudWatch agent on the host (if you can get the chance) and see if it points to resource exhaustion anywhere.

profile picture
EXPERT
Steve_M
answered 6 months ago
0

In such cases, you should open a support ticket as most likely it is sonethibg on AWS side to be checked/fixed.

profile picture
EXPERT
answered 4 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions