Skip to content

How to manage dockerized component lifecycle the best way

1

I have a private greengrass component which is essentially wrapper for a docker container. I'm not sure how should I manage it's lifecycle. Component is meant to be deployed on edge device which might suffer from occassional network or power outages.

According to greengrass component receipe reference it seems that startup lifecycle script is the most suitable choice for running dockerized service. Ultimately startup script would run docker run -d ... <image-name> resulting in running docker container and component in RUNNING state. Then whenever there's a new depolyment, shutdown lifecycle script would be called - stopping and removing particular container.

Unfortunately, I can see two main drawbacks of this approach:

  1. By default there's no recovery mechanism in case of container failure. As far as I know, once startup lifecycle script exists with successful code, component remains in RUNNING state. Nucleus doesn't have any knowledge of started container health. If it exits unexpectedly, component still remains in RUNNING state, thus Nucleus has no reason to restart it. At least that's my understanding. I've also verified this scenario by performing deployment and then, once startup succeeds, manually stopping docker container. As expected, Nucleus doesn't attempt to restart component.
  2. It seems it's not possible to delegate container's lifecycle management to docker - another idea I had was to configure restart policy of container in docker itself. I could configure restart policy to unless-stopped or on-failure according to docker docs, however, this way it would interfere with startups that Nucleus performs when host device is restarted. When there's a power outage and my edge device shuts down and the starts automatically, both docker and Nucleus would try to restart container. Docker by its builtin mechanisms according to restart policy and Nucleus by executing startup script again. AFAIK, there's no way to configure Nucleus to not skip component startup when system boots. I was thinking about storing information about startups somewhere in a filesystem and the based on that information skip startups triggered by Nucleus upon system boot. It feels a bit hacky tho, so it's not ideal.

What's the recommended way to appraoch that? Basically, I want three things:

  1. Components are always restarted when there's a failure
  2. Components are brought back to life after power outage in a predictable way
  3. Components are wrappers around docker containers
asked a year ago271 views
1 Answer
0

Hi

For 1, can you elaborate on how you expect your docker component to exit? Greengrass traps any non zero exit code for a component (including docker ones) as errors and that transitions the lifecycle of that component from previous state to ERRORED. Greengrass will attempt to restart an ERRORED component at most twice before reporting its status as BROKEN. I would imagine in your case, your docker component needs to handle exceptions and ensure it exits with a non zero error code that Greengrass can trap. Please note, Greengrass will not always restart an ERRORED component but only twice at max.

For 2 and 3, your assessment is correct and I would advice you to let Greengrass control a component's lifecycle and not let Docker handle it. However, I do believe you can use the container's file system to your advantage and, if necessary, skip a lifecycle through the component itself. For example, in your STARTUP code, if path A exists, just exit with a success code. Unfortunately, Greengrass does not allow skipping lifecycle states.

Hope this helps!

AWS
answered a year ago
  • Thanks for answering :)

    What you've described in first paragraph is correct, however, for a startup script component is launched and then GG doesn't care about it (until next system restart), so it won't pick up non-zero exit code.

    It works for a compnnent launched with run script, because then GG actually controls if component hasn't exited with non-zero code. I redesigned my components to use run lifecycle and it works as it should now.

    I'm not sure tho, what's the purpose of startup lifecycle then, it seems a bit useless. I also believe that docs are misleading.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.