Rollback docker component with AWS Greengrass V2

0

Hey there,

I am trying to test the Rollback function for deploying a docker container on a fleet of Raspberry Pi's. For that cause, I first deployed a container 1 that calls a python script that prints out "Hello, world!" to the console. I then created a deliberately non-working container 2 where the docker command tries to execute a python script which does not exist. When I revise the deployment to include the component of docker container 2 instead of the previously running container 1, the component fails as expected and enters into broken state (currentState=BROKEN). However, no rollback to the previously working deployment with container 1 occurs. Why not?

The deployment status always shows "Succeeded" but the device status turns to "Unhealthy".

My deployment.json is as follows:

{
    "targetArn": "arn:aws:iot:eu-central-1:242944196659:thinggroup/flappiedoors",
    "revisionId": "40",
    "deploymentId": "ba6b2009-15c8-4b7b-ab90-905211bb3894",
    "deploymentName": "test_deployments",
    "deploymentStatus": "ACTIVE",
    "iotJobId": "1f18b898-9d95-4890-97c4-4c1ee6a68282",
    "iotJobArn": "arn:aws:iot:eu-central-1:242944196659:job/1f18b898-9d95-4890-97c4-4c1ee6a68282",
    "components": {
        "aws.greengrass.LogManager": {
            "componentVersion": "2.3.1",
            "configurationUpdate": {
                "merge": "{\"logsUploaderConfiguration\":{\"systemLogsConfiguration\":{\"uploadToCloudWatch\":\"true\",\"deleteLogFileAfterCloudUpload\":\"true\"},\"componentLogsConfigurationMap\":{\"com.example.MyPrivateDockerComponent\":{\"deleteLogFileAfterCloudUpload\":\"true\"}}}}"
            },
            "runWith": {}
        },
        "aws.greengrass.SecureTunneling": {
            "componentVersion": "1.0.13"
        },
        "com.example.MyPrivateDockerComponent": {
            "componentVersion": "2.0.0"
        }
    },
    "deploymentPolicies": {
        "failureHandlingPolicy": "ROLLBACK",
        "componentUpdatePolicy": {
            "timeoutInSeconds": 60,
            "action": "NOTIFY_COMPONENTS"
        }
    },
    "iotJobConfiguration": {
        "jobExecutionsRolloutConfig": {
            "maximumPerMinute": 1000
        }
    },
    "creationTimestamp": "2023-03-27T12:31:28.764Z",
    "isLatestForTarget": true,
    "tags": {}
}

For Reference, this is my component recipe for the according docker containers. The only thing I change between the two is the "ComponentVersion" and the container tag in the "Run" and "Shutdown" commands.

{
  "RecipeFormatVersion": "2020-01-25",
  "ComponentName": "com.example.MyPrivateDockerComponent",
  "ComponentVersion": "2.0.0",
  "ComponentType": "aws.greengrass.generic",
  "ComponentDescription": "A component that runs a Docker container from a private Amazon ECR image.",
  "ComponentPublisher": "Amazon",
  "ComponentDependencies": {
    "aws.greengrass.DockerApplicationManager": {
      "VersionRequirement": ">=2.0.0 <2.1.0",
      "DependencyType": "HARD"
    },
    "aws.greengrass.TokenExchangeService": {
      "VersionRequirement": ">=2.0.0 <2.1.0",
      "DependencyType": "HARD"
    }
  },
  "Manifests": [
    {
      "Platform": {
        "os": "all"
      },
      "Lifecycle": {
        "Run": "docker run 242944196659.dkr.ecr.eu-central-1.amazonaws.com/test_repo:0.0.1",
        "Shutdown": "docker stop $(docker ps -a -q --filter ancestor=242944196659.dkr.ecr.eu-central-1.amazonaws.com/test_repo:0.0.1)"
      },
      "Artifacts": [
        {
          "Uri": "docker:242944196659.dkr.ecr.eu-central-1.amazonaws.com/test_repo:fileerror",
          "Unarchive": "NONE",
          "Permission": {
            "Read": "OWNER",
            "Execute": "NONE"
          }
        }
      ]
    }
  ],
  "Lifecycle": {}
}

These are my component logs:

2023-03-27T12:33:19.673Z [INFO] (pool-2-thread-33) com.example.MyPrivateDockerComponent: shell-runner-start. {scriptName=services.com.example.MyPrivateDockerComponent.lifecycle.Run, serviceName=com.example.MyPrivateDockerComponent, currentState=STARTING, command=["docker run 242944196659.dkr.ecr.eu-central-1.amazonaws.com/test_repo:fileerror"]}
2023-03-27T12:33:21.952Z [WARN] (Copier) com.example.MyPrivateDockerComponent: stderr. python3: can't open file 'hello_world.py': [Errno 2] No such file or directory. {scriptName=services.com.example.MyPrivateDockerComponent.lifecycle.Run, serviceName=com.example.MyPrivateDockerComponent, currentState=RUNNING}
2023-03-27T12:33:22.779Z [INFO] (Copier) com.example.MyPrivateDockerComponent: Run script exited. {exitCode=2, serviceName=com.example.MyPrivateDockerComponent, currentState=RUNNING}
2023-03-27T12:33:22.807Z [INFO] (pool-2-thread-31) com.example.MyPrivateDockerComponent: shell-runner-start. {scriptName=services.com.example.MyPrivateDockerComponent.lifecycle.Shutdown, serviceName=com.example.MyPrivateDockerComponent, currentState=STOPPING, command=["docker stop $(docker ps -a -q --filter ancestor=242944196659.dkr.ecr.eu-centra..."]}
2023-03-27T12:33:23.546Z [INFO] (Copier) com.example.MyPrivateDockerComponent: stdout. f07d93c3c983. {scriptName=services.com.example.MyPrivateDockerComponent.lifecycle.Shutdown, serviceName=com.example.MyPrivateDockerComponent, currentState=STOPPING}
2023-03-27T12:33:23.594Z [INFO] (pool-2-thread-31) com.example.MyPrivateDockerComponent: shell-runner-start. {scriptName=services.com.example.MyPrivateDockerComponent.lifecycle.Run, serviceName=com.example.MyPrivateDockerComponent, currentState=STARTING, command=["docker run 242944196659.dkr.ecr.eu-central-1.amazonaws.com/test_repo:fileerror"]}
2023-03-27T12:33:25.985Z [WARN] (Copier) com.example.MyPrivateDockerComponent: stderr. python3: can't open file 'hello_world.py': [Errno 2] No such file or directory. {scriptName=services.com.example.MyPrivateDockerComponent.lifecycle.Run, serviceName=com.example.MyPrivateDockerComponent, currentState=RUNNING}
2023-03-27T12:33:26.714Z [INFO] (Copier) com.example.MyPrivateDockerComponent: Run script exited. {exitCode=2, serviceName=com.example.MyPrivateDockerComponent, currentState=RUNNING}
2023-03-27T12:33:26.756Z [INFO] (pool-2-thread-31) com.example.MyPrivateDockerComponent: shell-runner-start. {scriptName=services.com.example.MyPrivateDockerComponent.lifecycle.Shutdown, serviceName=com.example.MyPrivateDockerComponent, currentState=STOPPING, command=["docker stop $(docker ps -a -q --filter ancestor=242944196659.dkr.ecr.eu-centra..."]}
2023-03-27T12:33:27.511Z [INFO] (Copier) com.example.MyPrivateDockerComponent: stdout. 4ba1ed3b2ae0. {scriptName=services.com.example.MyPrivateDockerComponent.lifecycle.Shutdown, serviceName=com.example.MyPrivateDockerComponent, currentState=STOPPING}
2023-03-27T12:33:27.513Z [INFO] (Copier) com.example.MyPrivateDockerComponent: stdout. f07d93c3c983. {scriptName=services.com.example.MyPrivateDockerComponent.lifecycle.Shutdown, serviceName=com.example.MyPrivateDockerComponent, currentState=STOPPING}
2023-03-27T12:33:27.560Z [INFO] (pool-2-thread-31) com.example.MyPrivateDockerComponent: shell-runner-start. {scriptName=services.com.example.MyPrivateDockerComponent.lifecycle.Run, serviceName=com.example.MyPrivateDockerComponent, currentState=STARTING, command=["docker run 242944196659.dkr.ecr.eu-central-1.amazonaws.com/test_repo:fileerror"]}
2023-03-27T12:33:30.461Z [WARN] (Copier) com.example.MyPrivateDockerComponent: stderr. python3: can't open file 'hello_world.py': [Errno 2] No such file or directory. {scriptName=services.com.example.MyPrivateDockerComponent.lifecycle.Run, serviceName=com.example.MyPrivateDockerComponent, currentState=RUNNING}
2023-03-27T12:33:31.206Z [INFO] (Copier) com.example.MyPrivateDockerComponent: Run script exited. {exitCode=2, serviceName=com.example.MyPrivateDockerComponent, currentState=RUNNING}
2023-03-27T12:33:31.221Z [INFO] (pool-2-thread-31) com.example.MyPrivateDockerComponent: shell-runner-start. {scriptName=services.com.example.MyPrivateDockerComponent.lifecycle.Shutdown, serviceName=com.example.MyPrivateDockerComponent, currentState=BROKEN, command=["docker stop $(docker ps -a -q --filter ancestor=242944196659.dkr.ecr.eu-centra..."]}
2023-03-27T12:33:31.943Z [INFO] (Copier) com.example.MyPrivateDockerComponent: stdout. 8523b3d4bc02. {scriptName=services.com.example.MyPrivateDockerComponent.lifecycle.Shutdown, serviceName=com.example.MyPrivateDockerComponent, currentState=BROKEN}
2023-03-27T12:33:31.944Z [INFO] (Copier) com.example.MyPrivateDockerComponent: stdout. 4ba1ed3b2ae0. {scriptName=services.com.example.MyPrivateDockerComponent.lifecycle.Shutdown, serviceName=com.example.MyPrivateDockerComponent, currentState=BROKEN}
2023-03-27T12:33:31.944Z [INFO] (Copier) com.example.MyPrivateDockerComponent: stdout. f07d93c3c983. {scriptName=services.com.example.MyPrivateDockerComponent.lifecycle.Shutdown, serviceName=com.example.MyPrivateDockerComponent, currentState=BROKEN}
  • Hi, could you update description to provide the recipe's for the component (assume 1.0.0 and 2.0.0) along with the contents of the deployment.json used? As ROLLBACK is performed at the deployment level, also having the log entries would be helpful.

    Also, in the recipe provided, Run and Shutdown aren't compatible. As a result, the deployment can be complete and successful even if the Run lifecycle script fails soon after running.

    Best is to use Startup and Shutdown to properly track the return code of docker run.

  • @Gavin_A; thanks for your comment. I updated the question. Hope this helps. Also, I am using Shutdown to make sure that the previous container is stopped once I deploy a new one. should I just replace "Run" by "Startup"?

1 Answer
1

The reason the deployment does not rollback is that when it fails Greengrass has already reported the successful deployment status back to Greengrass service. As @Gavin_A says, you need to run the docker run as a Startup and not as a Run script. Remember to put the container in background mode since startup would otherwise timeout and fail.

Greengrass expects the scripts run in Startup to exit before the Timeout. If the status code is 0 the deployment is considered successful, if different from 0, the deployment is considered failed and if rollback has been configured, Greengrass rollsback to the previous working component version.

AWS
EXPERT
answered a year ago
  • Thanks for your advice. I changed the components recipe by substituting "Run" with "Startup" and adding a '-d' flag to the docker run command in order to run it in background (detached) mode. Rollback still does not work. This time however, the device stays healthy and the Job status is Successful. But there is no docker container running on the Raspberry Pi.

    These are the logs I recieved.

    2023-03-31T09:20:19.457Z [INFO] (Copier) com.example.MyPrivateDockerComponent: stdout. b50ef0aa98d2. {scriptName=services.com.example.MyPrivateDockerComponent.lifecycle.Shutdown, serviceName=com.example.MyPrivateDockerComponent, currentState=STOPPING}
    2023-03-31T09:20:19.519Z [INFO] (pool-2-thread-24) com.example.MyPrivateDockerComponent: shell-runner-start. {scriptName=services.com.example.MyPrivateDockerComponent.lifecycle.Startup, serviceName=com.example.MyPrivateDockerComponent, currentState=STARTING, command=["docker run -d --rm 242944196659.dkr.ecr.eu-central-1.amazonaws.com/test_repo:f..."]}
    2023-03-31T09:20:20.207Z [INFO] (Copier) com.example.MyPrivateDockerComponent: stdout. 60669c1a58e789fb25dabcdaf2e60c988b1a874652b342062629d7feb8816a14. {scriptName=services.com.example.MyPrivateDockerComponent.lifecycle.Startup, serviceName=com.example.MyPrivateDockerComponent, currentState=STARTING}
    2023-03-31T09:20:21.814Z [INFO] (Copier) com.example.MyPrivateDockerComponent: Startup script exited. {exitCode=0, serviceName=com.example.MyPrivateDockerComponent, currentState=STARTING}
    

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions