Unexpected instance reboots following cloud-init user data script termination

0

In leveraging AWS's user data mechanism, we aim to utilize the terminate behavior of instance initiated shutdown to gracefully terminate instances through the execution of a shutdown command within the user data script.

However, we've encountered instances that unexpectedly reboot and rerun the user data script instead of terminating as intended. We cannot reproduce this issue, that occur unexpectedly from time to time.

Noteworthy findings from the cloud-init-output.log contributing to this observation include:

Cloud-init v. 19.3-46.amzn2.0.1 running 'init-local' at Thu, 01 Feb 2024 08:06:53 +0000. Up 4.19 seconds.
Cloud-init v. 19.3-46.amzn2.0.1 running 'init' at Thu, 01 Feb 2024 08:06:54 +0000. Up 4.74 seconds.
ci-info: ++++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++++

...

/var/lib/cloud/instance/scripts/part-001: line 2:  2599 Terminated [redacted]
Feb 01 08:08:07 cloud-init[2582]: util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-001 [143]
Cloud-init v. 19.3-46.amzn2.0.1 running 'init-local' at Thu, 01 Feb 2024 08:08:23 +0000. Up 4.32 seconds.
Cloud-init v. 19.3-46.amzn2.0.1 running 'init' at Thu, 01 Feb 2024 08:08:24 +0000. Up 4.89 seconds.
ci-info: ++++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++++

This indicates that cloud-init runs twice, with the first run "Terminated" for unknown reasons, potentially preventing the shutdown command execution. Subsequently, cloud-init runs again (after a restart that is not supposed to occur?) and the script fails (143) during the second run so no shutdown are called, leading to instances requiring manual cleanup.

We seek insights into the root cause of this behavior and whether it constitutes a bug.

已提问 3 个月前193 查看次数
1 回答
0

Hi, Did you try this with newer version of Amazon Linux 2023 which has latest version of cloud-init? This is just to rule out any potential limitation/bug with cloud-init version 19.x you have mentioned. Below are few additional points you could verify:

  • Review User Data Script: Check for any commands that might inadvertently cause a reboot rather than a shutdown. Ensure that the script explicitly calls for a shutdown (e.g., shutdown -h now or poweroff) and that there are no conditions under which a reboot command (reboot) could be executed instead.
  • Instance Settings: Verify the instance's shutdown behavior is correctly set to terminate.
  • Cloud-init Configuration: Look into cloud-init's configuration to ensure it's set up to handle user data scripts as expected. You might need to adjust cloud-init settings to prevent reexecution of scripts that should only run once.
已回答 3 个月前
profile picture
专家
已审核 1 个月前
  • Hi, we were previously using AL2 and plan to transition to the latest version of Amazon Linux in 2023. We'll ensure to provide you with updates if the issue persists at that time.

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则