AWS IoT Greengrass: How does the deployment Rollback function work? what kind of errors does it roll back?

0

In AWS IoT Greengrass, there is a Rollback option for deployments under "Deployment Policies". If I understood correctly it to rolls back devices to their previous configuration if the deployment fails. I wanted to test this so I purposely build a nonworking ECR docker image and deployed it through a greengrass component. (I basically introduced a python filenotfouderror by commanding to run a nonexistend python script in my Dockerfile.)

Before that, I had a working container running. What I would like to see is my device rolling back to running the old (working) container after realizing that the container failed. However, this doesn't happen. Only the device state changes to unhealthy in the AWS console.

Now my question: What kind of errors is this Rollback function able to detect/handle? and do you have any suggestions on how I could achieve my goal of rolling back the device if the docker cmd or any file therein shows an error?

Thanks a lot for you help!

已提问 1 年前412 查看次数
2 回答
0

Hi. Given that your device is unhealthy, it seems the rollback didn't occur or it failed. Please check the device deployment status (describe-job-execution) as described here (or look in the console): https://docs.aws.amazon.com/greengrass/v2/developerguide/check-deployment-status.html#check-device-deployment-status

And please check the Greengrass and component logs as described here: https://docs.aws.amazon.com/greengrass/v2/developerguide/monitor-logs.html

profile pictureAWS
专家
Greg_B
已回答 1 年前
  • thanks for your answer. I've been trying to replace one component of my deployment by a new different one (that shall fail). maybe the rollback only works between different versions of the same component?

0

The job execution always stated SUCCESSFUL but the device is unhealthy.

This is my Component description:

{
  "RecipeFormatVersion": "2020-01-25",
  "ComponentName": "com.example.MyPrivateDockerComponent",
  "ComponentVersion": "1.1.6",
  "ComponentType": "aws.greengrass.generic",
  "ComponentDescription": "A component that runs a Docker container from a private Amazon ECR image.",
  "ComponentPublisher": "Amazon",
  "ComponentDependencies": {
    "aws.greengrass.DockerApplicationManager": {
      "VersionRequirement": ">=2.0.0 <2.1.0",
      "DependencyType": "HARD"
    },
    "aws.greengrass.TokenExchangeService": {
      "VersionRequirement": ">=2.0.0 <2.1.0",
      "DependencyType": "HARD"
    }
  },
  "Manifests": [
    {
      "Platform": {
        "os": "all"
      },
      "Lifecycle": {
        "Run": "docker run --rm 242944196659.dkr.ecr.eu-central-1.amazonaws.com/test_repo:fileerror",
        "Shutdown": "docker stop $(docker ps -q --filter ancestor=242944196659.dkr.ecr.eu-central-1.amazonaws.com/test_repo:fileerror)"
      },
      "Artifacts": [
        {
          "Uri": "docker:242944196659.dkr.ecr.eu-central-1.amazonaws.com/test_repo:fileerror",
          "Unarchive": "NONE",
          "Permission": {
            "Read": "OWNER",
            "Execute": "NONE"
          }
        }
      ]
    }
  ],
  "Lifecycle": {}
}

And these are the errors component errors I get on CloudWatch:

[WARN] (Copier) com.example.MyPrivateDockerComponent: stderr. Usage:  docker stop [OPTIONS] CONTAINER [CONTAINER...]. {scriptName=services.com.example.MyPrivateDockerComponent.lifecycle.Shutdown, serviceName=com.example.MyPrivateDockerComponent, currentState=BROKEN`

and

2023-03-27T09:11:24.110Z [WARN] (pool-2-thread-14) com.example.MyPrivateDockerComponent: shell-runner-error. {scriptName=services.com.example.MyPrivateDockerComponent.lifecycle.Shutdown, serviceName=com.example.MyPrivateDockerComponent, currentState=BROKEN, command=["docker stop $(docker ps -q --filter ancestor=242944196659.dkr.ecr.eu-central-1..."]}

so apparently there is a problem with my defined shutdown command. Because the container is exited and removed immediately after it fails, tha Shutdown command can't find the container to Shutdown anymore. but is that really the reason why the Rollback doesn't work and the device becomes unhealthy?

已回答 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则