Stop Greengrass Component from entering BROKEN after 3 crashes

0

I want to configure Nucleus to always restart my component if it exits with error, even if it has already restarted 3 times. What I am seeing right now is that if my component errors, it will successfully restart and re-enter the RUNNING state for the first 3 times. Based on this documentation, I would think this counts as a "successful" restart and that it will continue to restart to recover from the ERRORED state:

This step runs when a component enters the ERRORED state. If the component becomes ERRORED three times without successfully recovering, the component changes to the BROKEN state. To fix a BROKEN component, you must deploy it again.

However, that is not what I am seeing and after 3 recoveries the component enters the BROKEN state and cannot be recovered except by using the "Reinstall" button on the LocalDebugConsole. It seems that part of the issue is that from my view the component is successfully restarting but Greengrass seems to be treating the ERRORED -> RUNNING (for some length of time) -> ERRORED transition as a failed restart. Is there some way to tell Nucleus to do this "reinstall" action on error? I am wondering if an alternative is to use startup rather than run and manually report the successfully state transition to RUNNING from my component.

posta un anno fa401 visualizzazioni
1 Risposta
1
Risposta accettata

When a component becomes errored 3 times in 1 hour, then it is broken and needs to be reinstalled which you can do in the local debug console or by deploying a fixed version of the component. If your component keeps erroring, then you need to address the root cause of the issue and make it stop exiting with an error.

Cheers,

Michael

AWS
ESPERTO
con risposta un anno fa
  • So the requirement to enter BROKEN is crashing 3 times in 1 hour, and not crashing 3 times period? That totally makes sense and resolves my question (not crashing that frequently is a totally reasonable requirement), just want to confirm I am understanding correctly.

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande