I want to configure Nucleus to always restart my component if it exits with error, even if it has already restarted 3 times. What I am seeing right now is that if my component errors, it will successfully restart and re-enter the RUNNING state for the first 3 times. Based on this documentation, I would think this counts as a "successful" restart and that it will continue to restart to recover from the ERRORED state:
This step runs when a component enters the ERRORED state. If the component becomes ERRORED three times without successfully recovering, the component changes to the BROKEN state. To fix a BROKEN component, you must deploy it again.
However, that is not what I am seeing and after 3 recoveries the component enters the BROKEN state and cannot be recovered except by using the "Reinstall" button on the LocalDebugConsole. It seems that part of the issue is that from my view the component is successfully restarting but Greengrass seems to be treating the ERRORED -> RUNNING (for some length of time) -> ERRORED transition as a failed restart. Is there some way to tell Nucleus to do this "reinstall" action on error? I am wondering if an alternative is to use startup
rather than run
and manually report the successfully state transition to RUNNING from my component.
So the requirement to enter BROKEN is crashing 3 times in 1 hour, and not crashing 3 times period? That totally makes sense and resolves my question (not crashing that frequently is a totally reasonable requirement), just want to confirm I am understanding correctly.