One customer has a config rule to detect drifts on our stacks. Since Friday all monitored stacks have been thrown the "Internal Failure" error. We could pinpoint it down to some resources giving us this error when drift detection is run on the complete stack. So far it is AWS::IAM::ManagedPolicy
and AWS::Config::ConfigRule
.
here is a PoC on how to reproduce this:
AWSTemplateFormatVersion: "2010-09-09"
Description: PoC stack for Failed to detect drift on resources Internal Failure
Resources:
S3Bucket:
Type: "AWS::S3::Bucket"
DeletionPolicy: Delete
DenyAllPolicy:
Type: AWS::IAM::ManagedPolicy
Properties:
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: "Deny"
Action:
- "*"
Resource: "*"
When deployed you can run these commands to see the behavior:
This will set the drift id to the env variable DRIFT_ID
(make sure to replace the <stack_name>
)
$ DRIFT_ID=$(aws cloudformation detect-stack-drift --stack-name <stack-name> --query StackDriftDetectionId --output text )
Which then will be used in this command to get the actual results of the drift detection:
$ aws cloudformation describe-stack-drift-detection-status --stack-drift-detection-id $DRIFTID
{
"StackId": "arn:aws:cloudformation:<region>:<account_id>:stack/drift-poc/<id>",
"StackDriftDetectionId": "<detection_id>",
"StackDriftStatus": "IN_SYNC",
"DetectionStatus": "DETECTION_FAILED",
"DetectionStatusReason": "{\"Summary\":\"Failed to detect drift on resources [S3Bucket]\",\"Failures\":[{\"Resource\":\"S3Bucket\",\"FailureReason\":\"Internal Failure\"}]}",
"DriftedStackResourceCount": 0,
"Timestamp": "2024-03-12T07:59:21.951000+00:00"
}
What we also tried was to run a drift detection for all resources individually, which worked fine.
$ aws cloudformation detect-stack-resource-drift --stack-name drift-poc --logical-resource-id S3Bucket
{
"StackResourceDrift": {
"StackId": "arn:aws:cloudformation:<region>:<account_id>:stack/drift-poc/<id>",
"LogicalResourceId": "S3Bucket",
"PhysicalResourceId": "drift-poc-s3bucket-<hash>",
"ResourceType": "AWS::S3::Bucket",
"ExpectedProperties": "{\"Tags\":[{\"Key\":\"project\",\"Value\":\"drift-poc\"}]}",
"ActualProperties": "{\"Tags\":[{\"Key\":\"project\",\"Value\":\"drift-poc\"}]}",
"PropertyDifferences": [],
"StackResourceDriftStatus": "IN_SYNC",
"Timestamp": "2024-03-12T08:52:41.603000+00:00"
}
}
$ aws cloudformation detect-stack-resource-drift --stack-name drift-poc --logical-resource-id DenyAllPolicy
{
"StackResourceDrift": {
"StackId": "arn:aws:cloudformation:<region>:<account_id>:stack/drift-poc/<id>",
"LogicalResourceId": "DenyAllPolicy",
"PhysicalResourceId": "arn:aws:IAM::<account_id>:policy/drift-poc-DenyAllPolicy-peOszzUwXpYh",
"ResourceType": "AWS::IAM::ManagedPolicy",
"ExpectedProperties": "{\"PolicyDocument\":{\"Version\":\"2012-10-17\",\"Statement\":[{\"Action\":[\"*\"],\"Resource\":\"*\",\"Effect\":\"Deny\"}]}}",
"ActualProperties": "{\"PolicyDocument\":{\"Version\":\"2012-10-17\",\"Statement\":[{\"Action\":[\"*\"],\"Resource\":\"*\",\"Effect\":\"Deny\"}]}}",
"PropertyDifferences": [],
"StackResourceDriftStatus": "IN_SYNC",
"Timestamp": "2024-03-12T08:55:08.668000+00:00"
}
}
As a workaround, we would need to deploy those resources in its own stack and exclude them from monitoring, which might be ok as a temporary solution. But this is not something we want to have permanent on our client's infrastructure.
EDIT: I could figure that AWS::IAM::ManagedPolicy
requires a Groups
, Users
or Roles
property. So instead of attaching the Policy on the User resource, you need to add the entity in the Policy. The Problem for AWS::Config::ConfigRule
still persists though
The role needs at least s3:GetObject permissions to detect drift on S3 buckets. This error even occurs when I run the command with my user, which has AdminAccess
The drift detection worked couple of weeks without any problem, but since Friday it throws this error. We didn't change anything on the stack. The PoC also shows that this can be reproduced on any account.
As shown in the PoC template it is successfully deployed and still throws this error
Already tried and still the same effect.
Is this also possible with a Config rule?
Already did, and nothing useful was found, also no error codes were shown
Also the flag
--ignore-resource-types
does not seem to exist