Unexpected instance reboots following cloud-init user data script termination

0

In leveraging AWS's user data mechanism, we aim to utilize the terminate behavior of instance initiated shutdown to gracefully terminate instances through the execution of a shutdown command within the user data script.

However, we've encountered instances that unexpectedly reboot and rerun the user data script instead of terminating as intended. We cannot reproduce this issue, that occur unexpectedly from time to time.

Noteworthy findings from the cloud-init-output.log contributing to this observation include:

Cloud-init v. 19.3-46.amzn2.0.1 running 'init-local' at Thu, 01 Feb 2024 08:06:53 +0000. Up 4.19 seconds.
Cloud-init v. 19.3-46.amzn2.0.1 running 'init' at Thu, 01 Feb 2024 08:06:54 +0000. Up 4.74 seconds.
ci-info: ++++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++++

...

/var/lib/cloud/instance/scripts/part-001: line 2:  2599 Terminated [redacted]
Feb 01 08:08:07 cloud-init[2582]: util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-001 [143]
Cloud-init v. 19.3-46.amzn2.0.1 running 'init-local' at Thu, 01 Feb 2024 08:08:23 +0000. Up 4.32 seconds.
Cloud-init v. 19.3-46.amzn2.0.1 running 'init' at Thu, 01 Feb 2024 08:08:24 +0000. Up 4.89 seconds.
ci-info: ++++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++++

This indicates that cloud-init runs twice, with the first run "Terminated" for unknown reasons, potentially preventing the shutdown command execution. Subsequently, cloud-init runs again (after a restart that is not supposed to occur?) and the script fails (143) during the second run so no shutdown are called, leading to instances requiring manual cleanup.

We seek insights into the root cause of this behavior and whether it constitutes a bug.

已提問 3 個月前檢視次數 193 次
1 個回答
0

Hi, Did you try this with newer version of Amazon Linux 2023 which has latest version of cloud-init? This is just to rule out any potential limitation/bug with cloud-init version 19.x you have mentioned. Below are few additional points you could verify:

  • Review User Data Script: Check for any commands that might inadvertently cause a reboot rather than a shutdown. Ensure that the script explicitly calls for a shutdown (e.g., shutdown -h now or poweroff) and that there are no conditions under which a reboot command (reboot) could be executed instead.
  • Instance Settings: Verify the instance's shutdown behavior is correctly set to terminate.
  • Cloud-init Configuration: Look into cloud-init's configuration to ensure it's set up to handle user data scripts as expected. You might need to adjust cloud-init settings to prevent reexecution of scripts that should only run once.
已回答 3 個月前
profile picture
專家
已審閱 1 個月前
  • Hi, we were previously using AL2 and plan to transition to the latest version of Amazon Linux in 2023. We'll ensure to provide you with updates if the issue persists at that time.

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南