We recently went to pull an image from ECR and it no longer existed.
Our ECR repositories have a simple Lifecycle policy:
{
"rules": [
{
"rulePriority": 2,
"description": "Expire untagged images",
"selection": {
"tagStatus": "untagged",
"countType": "imageCountMoreThan",
"countNumber": 5
},
"action": {
"type": "expire"
}
},
{
"rulePriority": 6,
"description": "Image Retention",
"selection": {
"tagStatus": "any",
"countType": "imageCountMoreThan",
"countNumber": 50
},
"action": {
"type": "expire"
}
}
]
}
We found that for this repository, the PolicyExecutionEvent
ran five times at the same time (EPOCH). And the result was that it run our rule 6 for those five times and the result was only 12 remaining tagged images.
A quick script found that we had a few other repositories with this multi-run issue:
#!/bin/bash
# find_ecr_lifecycle_events.sh
for repo in $( aws ecr describe-repositories --query "repositories[].repositoryName" | jq -r '.[]' |sort )
do
echo $repo
aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=PolicyExecutionEvent --lookup-attributes AttributeKey=ResourceName,AttributeValue=arn:aws:ecr:us-east-1:${ACCOUNTID}:repository/${repo} --max-items 50 | jq .Events[].EventTime |sort |uniq -c |grep -v '^ 1 ' | sed 's/^/ /'
done
Does anyone know why AWS sometimes runs this policy multiple times? Should we not be using an imageCoundMoreThan operator?
Thanks!