Unexpected Reboot and EFI Shell Issue During the Reboot on AWS Node in a Self-hosted Kubernetes Cluster

0

Hello AWS Community,

I am experiencing an issue with an AWS node that is part of a Kubernetes cluster deployed using Kops.

Cluster Configuration:

Deployment: Kubernetes cluster via Kops.

Node Count: 3 AWS nodes.

Affected Node Details:

Hosts a Stateful Pod: One node has a stateful pod with an EBS volume.

EBS Volume Configuration:

Root Volume (/dev/sda1): 24 GiB, Attached, Delete on Termination: Yes.

Data Volume (/dev/xvdba): 24 GiB, Attached, Delete on Termination: No.

Instance Uptime: Operational for 109 days before the incident.

Incident Details:

Unexpected Reboot: The instance rebooted unexpectedly.

CloudTrail Logs: No manual reboot events logged; suggests AWS-initiated reboot due to health check failures.

AWS Health Dashboard: No scheduled maintenance noted.

Resource Utilization: Low CPU and memory usage.

Additional Observation:

After the reboot, the node displayed messages typical of a normal boot process but then dropped to the EFI Shell with a message "No bootable device found". This suggests potential issues with the boot volume or configuration.

Questions:

What might cause an AWS node to unexpectedly reboot and subsequently fail to find a bootable device, dropping into the EFI Shell?

How should I approach troubleshooting this EFI Shell issue in the context of an AWS EC2 instance?

Are there known issues with Kubernetes clusters deployed via Kops that could lead to such behavior, particularly regarding EBS volume configurations or instance boot settings?

Any assistance or insights into these issues would be greatly appreciated. I am looking to understand the root causes to prevent future occurrences and ensure the reliability of our Kubernetes cluster.

Thank you for your time and help.

2 Answers
0

the issue might lie with AWS (like a hardware issue or an unlogged AWS action) or with your EBS configuration, particularly since the instance failed to find a bootable device. Investigating along these lines while also considering any Kubernetes-related configurations would be a prudent approach.

Amir
answered 4 months ago
0

The situation you're describing could be due to a few different factors. Let's address each of your questions:

  1. Causes of Unexpected Reboot and Boot Device Issues:

    • AWS-initiated Reboot: If CloudTrail logs indicate no manual reboot events and AWS Health Dashboard shows no scheduled maintenance, it's possible that AWS initiated a reboot due to underlying hardware issues or health check failures.
    • Boot Device Issues: The message "No bootable device found" suggests a problem with the boot volume or configuration. This could be due to a corrupted boot volume, misconfiguration, or failure in the boot process.
  2. Troubleshooting the EFI Shell Issue:

    • Check Boot Volume Configuration: Ensure that the boot volume is correctly attached to the instance and that the instance is configured to boot from it.
    • Verify Boot Volume Integrity: If possible, check the integrity of the boot volume. You may need to create a new boot volume or restore from a backup if the volume is corrupted.
    • Review EFI Shell Commands: Familiarize yourself with EFI Shell commands for diagnosing boot issues. You may need to use commands like map, fsX:, or ls to identify and access the boot volume.
    • Check Instance Logs: Review instance logs and system logs to see if there are any error messages or clues about what caused the boot failure.
    • AWS Support: If you're unable to resolve the issue, consider reaching out to AWS Support for assistance. They can help investigate potential underlying hardware issues or provide guidance on troubleshooting steps.
  3. Issues with Kubernetes Clusters Deployed via Kops:

    • EBS Volume Configurations: While Kubernetes clusters deployed via Kops are generally reliable, there could be issues with EBS volume configurations or instance settings that impact the boot process. Ensure that your EBS volumes are correctly attached and configured within your Kubernetes cluster.
    • Instance Boot Settings: Double-check the instance boot settings to ensure they are configured correctly, especially if you're using custom AMIs or have made modifications to the instance configuration.

In summary, troubleshooting unexpected reboots and boot device issues on AWS EC2 instances requires a systematic approach, including verifying instance configurations, reviewing logs, and potentially reaching out to AWS Support for assistance. Pay special attention to the EFI Shell messages and instance boot settings to identify and resolve the underlying issue.

profile picture
EXPERT
answered 24 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions