- Newest
- Most votes
- Most comments
Answering my own question here. The issue is related to memory resources:
-
The memory allocation was too small for the workload. I therefore upgraded from 512MB to 1 GB of RAM.
-
Even after increasing memory, when attempting to run a release upgrade, the server would still run out of memory and begin to consume 60-90% of CPU resources. Eventually the high CPU usage caused SSH to stop working and ultimately the entire server instance to stop working.
-
Ideal solution would be to add more RAM but since this issue only occurs when attempting to upgrade the O/S, I configured a 2GB swap file which resolved the issue.
I hope this helps any others who experience similar issues.
Hi,
You could try taking a root disk snapshot, creating a disk, and attaching it to a new instance to see if that helps it recover.
Thanks for your reply, Varun. Unfortunately, it also fails when attempting to generate any snapshot. In the end I had no option but to create a fresh server instance and rebuild the entire application stack manually from the bottom up. Luckily I had a tar backup, which i had created I and downloaded earlier, so was able to use the configuration files and other from that. Nevertheless it still required a full day to rebuild, even with that.
From now on I will be creating and storing a snapshots whenever I make significant changes on the server, which unfortunately adds to cost, but I will also be making a regular backup using tar and storing that backup offline, as I no longer trust snapshots as a recovery method given this experience.
You could potentially set up automated snapshots for the instance as well: https://docs.aws.amazon.com/lightsail/latest/userguide/amazon-lightsail-configuring-automatic-snapshots.html
This is a challenging situation, but there are a few steps you can try to recover your Lightsail instance:
-
Restart the instance: If you haven't already, try stopping and starting the instance from the Lightsail console. Sometimes this can resolve connectivity issues.
-
Check firewall settings: Ensure that the firewall settings in your Lightsail instance allow SSH connections (port 22) and other necessary ports.
-
Use the Lightsail console SSH: Try connecting to your instance using the built-in SSH client in the Lightsail console. This bypasses potential local network issues.
-
OS-level firewall: The upgrade may have enabled an OS-level firewall blocking SSH access. You could try running commands in the startup script to disable the firewall and restart SSH service. However, this requires access to the instance, which you currently don't have.
-
Resource constraints: The upgrade might have increased resource usage. Consider creating a new instance with a larger plan and restoring your data to it.
-
Root volume snapshot: If possible, create a snapshot of the instance's root volume. This can preserve your data even if you can't access the instance. You could then attach this snapshot to a new, accessible instance to recover your data.
-
Contact AWS Support: If none of these steps work and you're still unable to create a snapshot due to the 500 error, it's advisable to contact AWS Support. They have additional tools and access to help recover your instance or at least your data.
Remember, for future upgrades, it's always a good practice to create a snapshot before making significant changes to your instance. This provides a fallback option if issues arise during or after the upgrade process.
Sources
lightsail suddenly cannot be accessed | AWS re:Post
Cannot connect to my AWS Lightsail server | AWS re:Post
Can not connect to lightsail server even on web ssh, port is open | AWS re:Post
Clearly a stock response which does not take into account the actual content of my request.
Relevant content
- asked a year ago
- asked a year ago
- asked 2 years ago
- AWS OFFICIALUpdated 2 years ago

Finally, after creating a fresh blank Lightsail instance and manually rebuilding my entire application stack from scratch, I tried again today to upgrade to the latest release of Amazon Linux 2023 using the command "sudo dnf upgrade --releasever=2023.6.20250303". This immediately locked up the existing ssh session and blocked creation of any new ssh sessions. Monitoring cpu usage on the Lightsail console shows a rise in cpu usage to around 60% over an initial period of 10 minutes. Thereafter CPU usage drops to around 15% rising again to 85% over an additional period of 50 minutes. Eventually after more than an hour, CPU usage falls to 2.5% and the server remains inaccessible by all and any means.
Eventually I stopped the server and started it again and was again able to open an ssh session. However, sudo dnf check-release-update shows: A newer release of "Amazon Linux" is available.
Available Versions:
Version 2023.6.20250303: Run the following command to upgrade to 2023.6.20250303:
so clearly the upgrade has not been applied.
Two questions: