Terraform Auto Scaling Group with create_before_destroy - timeout error

0

My terrafrom with autoscaling_group looks like this below.

My goal: application updates without downtime with only 1 EC2 instance.

Healtcheck shows that the new isntance is healthy: Everything works ok when it comes to updating ami, then launuch template and creating a new auto scaling group, deleting the old one and replacing ec2 in target groups.

Healtcheck shows that the new isntance is unhealthy: I wanted to see what happens if the new ami is damaged and healtcheck shows that the EC2 is unhealthy. Result: terrafrom terminates after timeout with error:

timeout while waiting for state to become 'ok' (last state: 'want at least 1 healthy instance(s) registered to Load Balancer, have 0', timeout: 5m0s)

This is still ok but worst of all it leaves behind a new ASG and EC2 created, which I have to clean manually. Is there any way to check when executing terrafrom that the EC2 instance is still unhealthy after some time, abort its creation and destroy it?

resource "aws_launch_template" "foo" {

  name_prefix = "demo-app-${data.aws_ami.debian.id}"
  ebs_optimized = true
  image_id = data.aws_ami.debian.id
  instance_type = "t3.micro"

  iam_instance_profile {
    name = aws_iam_instance_profile.test_profile.name
  }

  monitoring {
    enabled = true
  }

  vpc_security_group_ids = [module.sg.ec2_security_group_id]

  tag_specifications {
    resource_type = "instance"

    tags = {
      Name = "test"
    }
  }
  user_data = filebase64("setup_app.sh")

  lifecycle {
    create_before_destroy = true
  }
}


resource "aws_autoscaling_group" "worker" {
  name = "${aws_launch_template.foo.name}-asg-test3"

  min_size             = 1
  desired_capacity     = 1
  max_size             = 1
  min_elb_capacity     = 1
  wait_for_capacity_timeout = "5m"
  health_check_type    = "EC2"
  force_delete         = true
  vpc_zone_identifier  = module.vpc.private_subnets
  target_group_arns         = [aws_lb_target_group.main_tg.arn]

  launch_template {
    id      = aws_launch_template.foo.id
    version = "$Latest"
  }

  lifecycle {
    create_before_destroy = true
    prevent_destroy       = false
  }

}

I tried something like this inside the resource aws_autoscaling_group but it doesn't work:

  provisioner "local-exec" {
    when = create
    command = "timeout 2m terraform destroy -target=aws_autoscaling_group.worker -auto-approve -force"
  }
1 Answer
0

Option 1: You can add a timeout related to create and delete operations to a specific resource inside terraform like this. Add this to your autoscaling group and check.

 timeouts {
    create = "60m"
    delete = "2h"
  }

Option 2: if the option1 doesn't work, you could have a lambda triggered based on the lifecycle hook - termination event of the ASG to clean up the old Asg. The old Asg could be identified based on any of the ASG metadata (like Tags -name)

Check this for steps - https://docs.aws.amazon.com/autoscaling/ec2/userguide/tutorial-lifecycle-hook-lambda.html

answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions