Skip to content

CloudFormation stack is stuck on CREATE_IN_PROGRESS due to an EC2 instance

0

I have a CloudFormation stack yaml file that did work couple of months ago. Nothing changed and I wanted to create a deployment once again.
For some reason it gets stuck now due to an EC2 instance which is stuck on CREATE_IN_PROGRESS.
This instance state in the EC2 page is: Running and status check is 2/2 checks passed.
After a while I get a CloudFormation timeout:

Resource handler returned message: "Exceeded attempts to wait" (RequestToken: 3f...cd, HandlerErrorCode: NotStabilized)

This issue happens most of the times, but sometimes the deployment succeeds.
I tried narrowing down the issue without success. I thought it happens due to dependencies issues.

My yaml structure basically creates 3 VMs, a VPC and subnets and a traffic mirroring from VM2's interface to VM1' interface.
This is basically the full yaml. I removed some SecurityGroups so it will be shorter.

Resources:
  Ec2VM1:
    Type: 'AWS::EC2::Instance'
    Properties:
      AvailabilityZone: us-east-1b
      KeyName: keypair1
      ImageId: ami-04a81a99f5ec58529
      InstanceType: m5.xlarge
      NetworkInterfaces:
        - DeviceIndex: 0
          SubnetId: !Ref myPubSubnet
          AssociatePublicIpAddress: true
          DeleteOnTermination: true
          GroupSet:
            - !Ref MySecurityGroup

  Ec2VM2:
    Type: 'AWS::EC2::Instance'
    Properties:
      AvailabilityZone: us-east-1b
      KeyName: keypair1
      ImageId: ami-066784287e358dad1
      InstanceType: m5.large
      NetworkInterfaces:
        - DeviceIndex: 0
          NetworkInterfaceId: !Ref VM2Eth1ENI

  Ec2VM3:
    Type: 'AWS::EC2::Instance'
    Properties:
      AvailabilityZone: us-east-1b
      KeyName: keypair1
      ImageId: ami-066784287e358dad1
      InstanceType: t2.small
      NetworkInterfaces:
        - DeviceIndex: 0
          SubnetId: !Ref myPubSubnet
          AssociatePublicIpAddress: true
          PrivateIpAddress: "10.0.1.210"
          DeleteOnTermination: true
          GroupSet:
            - !Ref MySecurityGroup

  Ec2VM1Eth1Attachment:
    Type: 'AWS::EC2::NetworkInterfaceAttachment'
    Properties:
      DeleteOnTermination: true
      DeviceIndex: 1
      InstanceId: !Ref Ec2VM1
      NetworkInterfaceId: !Ref VM1Eth1ENI

  Ec2VM1Eth2Attachment:
    Type: 'AWS::EC2::NetworkInterfaceAttachment'
    Properties:
      DeleteOnTermination: true
      DeviceIndex: 2
      InstanceId: !Ref Ec2VM1
      NetworkInterfaceId: !Ref VM1Eth2ENI

  Ec2VM3Eth1Attachment:
    Type: 'AWS::EC2::NetworkInterfaceAttachment'
    Properties:
      DeleteOnTermination: true
      DeviceIndex: 1
      InstanceId: !Ref Ec2VM3
      NetworkInterfaceId: !Ref VM3Eth1ENI

  MySecurityGroup:
    Type: 'AWS::EC2::SecurityGroup'
    Properties:
      GroupDescription: open ports
      VpcId: !Ref myVPC
      SecurityGroupIngress:
        - IpProtocol: tcp
          Description: SSH
          FromPort: 22
          ToPort: 22
          CidrIp: 0.0.0.0/0
        - IpProtocol: tcp
          Description: RDP
          FromPort: 3389
          ToPort: 3389
          CidrIp: 0.0.0.0/0
        - IpProtocol: icmp
          Description: ICMP
          FromPort: -1
          ToPort: -1
          CidrIp: 0.0.0.0/0

  myVPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/16
      EnableDnsSupport: true
      EnableDnsHostnames: true
      InstanceTenancy: "default"

  myInternetGateway:
    Type: AWS::EC2::InternetGateway

  myGatewayAttachment:
    Type: AWS::EC2::VPCGatewayAttachment
    Properties:
      VpcId: !Ref myVPC
      InternetGatewayId: !Ref myInternetGateway

  myRouteTable:
    Type: "AWS::EC2::RouteTable"
    Properties:
      VpcId: !Ref myVPC

  myInternetRoute:
    Type: "AWS::EC2::Route"
    DependsOn:
      - myInternetGateway
      - myGatewayAttachment
    Properties:
      DestinationCidrBlock: "0.0.0.0/0"
      GatewayId: !Ref myInternetGateway
      RouteTableId: !Ref myRouteTable

  myPubSubnet:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref myVPC
      CidrBlock: 10.0.1.0/24
      AvailabilityZone: us-east-1b
      MapPublicIpOnLaunch: true

  myPrivSubnet:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref myVPC
      CidrBlock: 10.0.2.0/24
      AvailabilityZone: us-east-1b
      MapPublicIpOnLaunch: true

  mySubnetPubRouteTableAssociation:
    Type: "AWS::EC2::SubnetRouteTableAssociation"
    Properties:
      RouteTableId: !Ref myRouteTable
      SubnetId: !Ref myPubSubnet

  mySubnetPrivRouteTableAssociation:
    Type: "AWS::EC2::SubnetRouteTableAssociation"
    Properties:
      RouteTableId: !Ref myRouteTable
      SubnetId: !Ref myPrivSubnet

  VM1Eth1ENI:
    Type: AWS::EC2::NetworkInterface
    Properties:
      SubnetId: !Ref myPrivSubnet
      GroupSet:
        - !Ref MySecurityGroup

  VM1Eth2ENI:
    Type: AWS::EC2::NetworkInterface
    Properties:
      PrivateIpAddress: "10.0.2.200"
      SubnetId: !Ref myPrivSubnet

  VM2Eth1ENI:
    Type: AWS::EC2::NetworkInterface
    Properties:
      SubnetId: !Ref myPrivSubnet
      GroupSet:
        - !Ref MySecurityGroup

  VM3Eth1ENI:
    Type: AWS::EC2::NetworkInterface
    Properties:
      PrivateIpAddress: "10.0.2.211"
      SubnetId: !Ref myPrivSubnet
      GroupSet:
        - !Ref MySecurityGroup

  myTrafficMirrorFilter:
    Type: "AWS::EC2::TrafficMirrorFilter"
    Properties:
      NetworkServices:
        - "amazon-dns"

  myTrafficMirrorFilterRule:
    Type: "AWS::EC2::TrafficMirrorFilterRule"
    Properties:
      TrafficMirrorFilterId: !Ref myTrafficMirrorFilter
      TrafficDirection: ingress
      RuleNumber: 100
      DestinationCidrBlock: "0.0.0.0/0"
      SourceCidrBlock: "0.0.0.0/0"
      RuleAction: accept

  myNetworkInterfaceTarget:
    Type: "AWS::EC2::TrafficMirrorTarget"
    DependsOn:
      - Ec2VM1
      - Ec2VM1Eth1Attachment
    Properties:
      NetworkInterfaceId: !Ref VM1Eth1ENI

  myTrafficMirrorSession:
    Type: "AWS::EC2::TrafficMirrorSession"
    DependsOn: Ec2VM2
    Properties:
      NetworkInterfaceId: !Ref VM2Eth1ENI
      TrafficMirrorTargetId: !Ref myNetworkInterfaceTarget
      TrafficMirrorFilterId: !Ref myTrafficMirrorFilter
      SessionNumber: 1
      VirtualNetworkId: 9797

1 Answer
1
Accepted Answer

you're experiencing is likely due to the fact that CloudFormation is waiting for the EC2 instance to reach a stable state, but it's not happening within the timeout period.

Here are a few potential solutions:

  1. Increase the timeout: You can increase the timeout period by adding a Timeout property to your EC2 instance resource. For example:

Ec2VM1: Type: 'AWS::EC2::Instance' Properties: ... Timeout: 60 # increase timeout to 60 minutes

  1. Add a dependency: Make sure that the EC2 instance is dependent on the network interface attachment. You can do this by adding a DependsOn property to your EC2 instance resource. For example:

Ec2VM1: Type: 'AWS::EC2::Instance' Properties: ... DependsOn: Ec2VM1Eth1Attachment

  1. Check network interface status: Ensure that the network interface is in a stable state before creating the EC2 instance. You can do this by adding a WaitCondition resource to your template. For example:

WaitCondition: Type: 'AWS::CloudFormation::WaitCondition' Properties: Handle: !Ref WaitConditionHandle Timeout: 300 # 5 minutes

WaitConditionHandle: Type: 'AWS::CloudFormation::WaitConditionHandle'

  1. Verify instance status: Check the instance status using the AWS CLI or SDK before creating the next resource. You can use a CustomResource to achieve this. For example:

CustomResource: Type: 'AWS::CloudFormation::CustomResource' Properties: ServiceToken: !GetAtt 'LambdaFunction.Arn' InstanceId: !Ref Ec2VM1

  1. Split the template: If the issue persists, try splitting the template into smaller parts and deploying them separately. This can help identify which resource is causing the issue.
EXPERT
answered a year ago
EXPERT
reviewed a year ago
EXPERT
reviewed a year ago
  • Thank you for your help!

    • "Increase the timeout" -> actually it took around 40 minutes to get to this timeout. In all successful deployments, my deployment finishes after several minutes. I do not think it relates to adding more time for the timeout.
    • "Add a dependency" -> I think it's the other way around. First the EC2 instance gets created. Only after, the NetworkInterfaceAttachment gets created because it depends on the instance. This is its definition:
      Ec2VM1Eth1Attachment:
        Type: 'AWS::EC2::NetworkInterfaceAttachment'
        Properties:
          DeleteOnTermination: true
          DeviceIndex: 1
          InstanceId: !Ref Ec2VM1
          ...
    

    You can see the InstanceId which is Ec2VM1. Actually I tried it and got a circular dependency. That's how I know it.

    • "Check network interface status" -> shouldn't it automatically happen the way I wrote the yaml ?
      The attachment is automatically depends on the interface and the VM. The VM starts, then the attachment gets created.
  • And you're correct again that the attachment should automatically depend on the interface and the VM, since you've specified the InstanceId and NetworkInterfaceId properties in the attachment resource.

    In that case, the issue might be related to the timing of the resource creation. CloudFormation might be trying to create the next resource before the attachment is fully complete.

    Let's try a few more things:

    1. Add a WaitCondition: Add a WaitCondition resource to wait for the attachment to be fully created before proceeding with the next resource.
    2. Use DependsOn: Use the DependsOn attribute to specify that the next resource depends on the attachment resource.
    3. Check CloudFormation logs: Check the CloudFormation logs to see if there are any errors or warnings related to the attachment resource.
  • ok, I wanted to understand more about WaitCondition because I'm not familiar with it.
    But I just saw in the CloudFormation logs that for some reason, my network attachments are being created before the EC2 instance, even though as I showed in the yaml, due to the InstanceId: !Ref Ec2VM1, it should be the opposite.
    What I did is, adding to each of the network attachments another DependsOn as you mentioned, on that same EC2 instance. Currently I don't see the deployment fails now. Is it an AWS bug?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.