Ubuntu instance stops responding to all connections; fails 'instance status check'; requires stop/start

0

This has happened 3-4 times in the last couple days on a new instance (t2.micro).

Issue: The instance stops accepting all incoming requests (that I'm aware of). No response to SSH. No response on web connections (80/443) to a site that is running. This seems to happen randomly -- no changes being made on the instance at the time of failure. There are no other instances or infra in the VPC -- its standalone.

What I've Tried: Stop/Start the instance heals it. Restart instance does not appear to work (hangs at 'rebooting'). Pulling system log after a reboot doesn't have any valuable info -- the log is too short, and only contains info related to the boot.

What is Running: Ubuntu 22.04 The instance is running Lemmy and no other services (a federated Reddit alternative): https://join-lemmy.org/

Any ideas on what could be going on or how to resolve?

tyfi
asked 10 months ago224 views
2 Answers
1

I'd encourage you to use autoscaling group behind ELB so that such events can be handled automatically.

There could be many reasons for failed "instance status check".

If ELB with auto scaling group is not the option, you can opt then, take a look at Linux instance check failure.

If this helps, that's great otherwise start looking at Troubleshoot instances with failed status checks to isolate the issue and rectify it.

Let me know how it goes.

profile pictureAWS
EXPERT
answered 10 months ago
0

If your server is running out of memory, it could cause it to become unresponsive. This is one of the most common causes of server unresponsiveness, and it can happen when you're running heavy applications or services that consume a lot of memory. Since you're running Lemmy, which is a complex web application, it's possible that it's consuming more memory than the t2.micro instance can provide. You can check your instance's memory usage by logging into your AWS console and looking at the CloudWatch metrics for the instance​

also you can check detailed installation requirements from the link https://join-lemmy.org/docs/administration/on_aws.html

profile picture
EXPERT
answered 10 months ago
  • I agree you are probably running out of memory and suggest a large instance type. You will not find memory usage in CloudWatch unless you install CloudWatch log agent on the instance to export the memory metrics. (https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html)

    Your best, quickest way to troubleshoot would be to change the instance type.

  • Thanks for the response! Looking at Cloudwatch metrics, I can see that some of the outages seem to correspond to CPU spiking to 100%. I will change the instance type as a troubleshooting step.

  • Absolutely, consider changing instance type, this would help you immediately. Long term, as I suggested, you should consider autoscaling group behind ALB, this would never let you get into that situation where instance got down and your application is impacted.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions