Nucleus upgrade restarts a component install step before timeout expires

1

I think this might be a bug, here is the scenario - in the same deployment upgrade aws.greengrass.Nucleus from any version (in my case 2.9.1) to 2.9.4 and an example component called com.example.longinstall . Nucleus doesn't seem to respect the installing component install step timeout, and restarts everything in the middle of the install. This is a problem for things like MSIs which are backgrounded and can't be interrupted.

Component source:

---
RecipeFormatVersion: "2020-01-25"
ComponentName: "com.example.longinstall"
ComponentVersion: "1.0.0"
ComponentType: "aws.greengrass.generic"
ComponentDescription: "This component takes 120s to install"
ComponentPublisher: "Me"
Manifests:
- Platform:
    os: "windows"
  Lifecycle:
    Install:
      Script: "powershell -command \"for ($i = 0; $i -lt 120; $i++){ 'Installing ('+$i+')'\
        \ | Write-Output; Start-Sleep -Seconds 1 }\""
      RequiresPrivilege: true
      Timeout: 150
    Startup:
      Script: "powershell -command \"write-output Startup\""
      RequiresPrivilege: true
    Shutdown:
      Script: "powershell -command \"write-output Shutdown\""
      RequiresPrivilege: true
  Artifacts: []
Lifecycle: {}

Log from component:

2023-03-09T00:58:47.101Z [INFO] (pool-2-thread-22) com.example.longinstall: shell-runner-start. {scriptName=services.com.example.longinstall.lifecycle.Install.Script, serviceName=com.example.longinstall, currentState=NEW, command=["powershell -command \u0022for ($i = 0; $i -lt 120; $i++){ 'Installing ('+$i+')' | W..."]}
2023-03-09T00:58:48.460Z [INFO] (Copier) com.example.longinstall: stdout. Installing (0). {scriptName=services.com.example.longinstall.lifecycle.Install.Script, serviceName=com.example.longinstall, currentState=NEW}
2023-03-09T00:58:49.484Z [INFO] (Copier) com.example.longinstall: stdout. Installing (1). {scriptName=services.com.example.longinstall.lifecycle.Install.Script, serviceName=com.example.longinstall, currentState=NEW}
2023-03-09T00:58:50.488Z [INFO] (Copier) com.example.longinstall: stdout. Installing (2). {scriptName=services.com.example.longinstall.lifecycle.Install.Script, serviceName=com.example.longinstall, currentState=NEW}
2023-03-09T00:58:51.502Z [INFO] (Copier) com.example.longinstall: stdout. Installing (3). {scriptName=services.com.example.longinstall.lifecycle.Install.Script, serviceName=com.example.longinstall, currentState=NEW}
..snip..
2023-03-09T00:59:46.057Z [INFO] (Copier) com.example.longinstall: stdout. Installing (57). {scriptName=services.com.example.longinstall.lifecycle.Install.Script, serviceName=com.example.longinstall, currentState=NEW}
2023-03-09T00:59:47.069Z [INFO] (Copier) com.example.longinstall: stdout. Installing (58). {scriptName=services.com.example.longinstall.lifecycle.Install.Script, serviceName=com.example.longinstall, currentState=NEW}
2023-03-09T00:59:53.984Z [INFO] (pool-2-thread-10) com.example.longinstall: shell-runner-start. {scriptName=services.com.example.longinstall.lifecycle.Install.Script, serviceName=com.example.longinstall, currentState=NEW, command=["powershell -command \u0022for ($i = 0; $i -lt 120; $i++){ 'Installing ('+$i+')' | W..."]}
2023-03-09T00:59:55.743Z [INFO] (Copier) com.example.longinstall: stdout. Installing (0). {scriptName=services.com.example.longinstall.lifecycle.Install.Script, serviceName=com.example.longinstall, currentState=NEW}
2023-03-09T00:59:56.752Z [INFO] (Copier) com.example.longinstall: stdout. Installing (1). {scriptName=services.com.example.longinstall.lifecycle.Install.Script, serviceName=com.example.longinstall, currentState=NEW}
..snip..
2023-03-09T01:01:55.966Z [INFO] (Copier) com.example.longinstall: stdout. Installing (119). {scriptName=services.com.example.longinstall.lifecycle.Install.Script, serviceName=com.example.longinstall, currentState=NEW}
2023-03-09T01:01:57.019Z [INFO] (pool-2-thread-10) com.example.longinstall: shell-runner-start. {scriptName=services.com.example.longinstall.lifecycle.Startup.Script, serviceName=com.example.longinstall, currentState=STARTING, command=["powershell -command \u0022write-output Startup\u0022"]}
2023-03-09T01:01:58.281Z [INFO] (Copier) com.example.longinstall: stdout. Startup. {scriptName=services.com.example.longinstall.lifecycle.Startup.Script, serviceName=com.example.longinstall, currentState=STARTING}
  • Where do you see the component is being "restarted"? If the startup is timing out you will see the log "Timeout in install" and the component will move to the errored state and then retry the installation. This log will be in the greengrass.log file.

    I see in your log that the startup script does execute which means that the installation must have completed.

    The timeout is definitely respected, but you may have something else going on which is causing a restart. You will need to review the greengrass.log file for things relating to your component.

    Cheers,

    Michael

  • Another possibility is that you are doing a local deployment but com.example.longinstall also exists in your AWS account. If you deploy, even locally, the cloud version of a component version will be deployed, not the local copy. So if this is the case, make sure you deploy a version which does not exist in your account.

  • This was with a cloud deployment, no local deployments ever on this machine. The deployment file only includes these components:

    "components": {
            "aws.greengrass.Cli": {
                "componentVersion": "2.9.4"
            },
            "aws.greengrass.LocalDebugConsole": {
                "componentVersion": "2.2.7"
            },
            "aws.greengrass.Nucleus": {
                "componentVersion": "2.9.4"
            },
            "com.example.longinstall": {
                "componentVersion": "1.0.0"
            }
        },
    

    You're right, it does complete the second time, but the timeout fires in greengrass.log at 00:59:17, when the install section started at 00:58:47, suspiciously 30 seconds. I can attach a santized log here in a minute.

  • Looks like there's no attachment feature and each comment can only be 600char, so I'll put it in code blocks and hopefully not make the thread unreadable: Nuclueus restart

    2023-03-09T00:58:45.218Z [INFO] (pool-2-thread-16) com.aws.greengrass.deployment.activator.DeploymentActivator: Nucleus restart requested to complete bootstrap task. {}
    

    Component is restarted (but not installed?)

    2023-03-09T00:58:47.054Z [INFO] (Serialized listener processor) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-config-change. Requesting restart for component. {configNode=services.com.example.longinstall.lifecycle.Install.Script, serviceName=com.example.longinstall, currentState=NEW}
    

    (part1)

  • Shutdown timeout exception:

    2023-03-09T00:59:17.073Z [ERROR] (pool-2-thread-16) com.aws.greengrass.lifecyclemanager.KernelLifecycle: services-shutdown-errored. {unclosedServices=[com.example.longinstall]}
    java.util.concurrent.TimeoutException
      at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
      at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
      at com.aws.greengrass.lifecyclemanager.KernelLifecycle.stopAllServices(KernelLifecycle.java:462)
      at com.aws.greengrass.lifecyclemanager.KernelLifecycle.softShutdown(KernelLifecycle.java:478)
      at com.aws.greengrass.lifecyclemanager.KernelLifecycle.shutdown(KernelLifecycle.java:516)
      at com.aws.greengrass.lifecyclemanager.KernelLifecycle.shutdown(KernelLifecycle.java:498)
      at com.aws.greengrass.lifecyclemanager.Kernel.shutdown(Kernel.java:273)
    <more stack trace>
    

    (part 2)

psact
asked a year ago77 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions