mac1.metal instance starts up with failing instance status check

1

I am trying to bring up a mac1.metal and about half the time, it'll end up with the Instance status check failing. This makes the machine unresponsive so that I cannot connect to it and can only tear it down and try bringing up a new one. This takes hours because you're required to have a dedicated host which does a scrub before you can try to allocate a new instance in its place. Nothing changes in my settings and I am using the same dedicated host. It's just a crapshoot at this point, a very expensive one at that.

Does anyone know how to better deal with instances that come up with a failing status check?

asked 2 years ago1131 views
3 Answers
0

Hello,

Thank you for reaching out regarding the instance status check failure.

Instance status check failure are mostly caused by activities on the OS and network interface. Take some time to review the possible causes on the documentation Instance status checks

To be able to troubleshoot the use case, we need to see the error and behavior of the Instance at that time it occurs. that would help to understand the possible cause and how to resolve it. I would recommended reaching out to AWS Premium Support once you encounter this issue.

Take some time to review the troubleshooting documentation on failed status check Troubleshoot instances with failed status checks - Bringing up interface eth0: Device eth0 has different MAC address than expected, ignoring. (Hard-coded MAC address)

If any of the above does not provide insight, reach out to AWS Premium Support

answered 2 years ago
0

In my experience, once a mac1.metal dedicated host fails to start an instance (the instance is marked as "Running", but has the "instance reachability test" failing), it will never work again. Even if you have "Instance auto-recovery" enabled, it will not work. You will have to wait until you can release it (24 hours) and try to get another host.

From my limited experience (up till now I ran only a few dozen mac dedicated hosts) - it is indeed a 50/50 chance to get a working host. My current solution is that if you get a failing host - chalk up the day as lost, and open a billing support case to ask for reimbursement on the cost of the failing host. They will run you around a bit (for the 24 hours that it takes to release the host - use email support, so you can use your time for other things) but will eventually give you back some money.

If enough people do that - maybe AWS will add a system to check and automatically recover (or reassign) failing hosts.

Update 23-04-2023:

Last week I had another of those mac1.metal issues and complained, and after a few days of "checking" I got an email explaining that :

"The dedicated host allocated to your account h-XXX did not have the latest firmware required to launch your chosen MacOS image due to incompatibility between changes to Apple’s software update process and EC2 infrastructure. Apple will be resolving this issue in an upcoming update."

I haven't needed to use another mac1.metal instance since then, but I can be hopeful that AWS has fixed all of their Mac hardware.

answered a year ago
0

I have the same issue. My mac1.metal instance cannot pass the instance status check anymore. I've setup a new dedicated host for the EC2 since the old one was release itself with "permanent failure". However this didn't help neither.

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions