Terraform Auto Scaling Group with create_before_destroy - timeout error

0

My terrafrom with autoscaling_group looks like this below.

My goal: application updates without downtime with only 1 EC2 instance.

Healtcheck shows that the new isntance is healthy: Everything works ok when it comes to updating ami, then launuch template and creating a new auto scaling group, deleting the old one and replacing ec2 in target groups.

Healtcheck shows that the new isntance is unhealthy: I wanted to see what happens if the new ami is damaged and healtcheck shows that the EC2 is unhealthy. Result: terrafrom terminates after timeout with error:

timeout while waiting for state to become 'ok' (last state: 'want at least 1 healthy instance(s) registered to Load Balancer, have 0', timeout: 5m0s)

This is still ok but worst of all it leaves behind a new ASG and EC2 created, which I have to clean manually. Is there any way to check when executing terrafrom that the EC2 instance is still unhealthy after some time, abort its creation and destroy it?

resource "aws_launch_template" "foo" {

  name_prefix = "demo-app-${data.aws_ami.debian.id}"
  ebs_optimized = true
  image_id = data.aws_ami.debian.id
  instance_type = "t3.micro"

  iam_instance_profile {
    name = aws_iam_instance_profile.test_profile.name
  }

  monitoring {
    enabled = true
  }

  vpc_security_group_ids = [module.sg.ec2_security_group_id]

  tag_specifications {
    resource_type = "instance"

    tags = {
      Name = "test"
    }
  }
  user_data = filebase64("setup_app.sh")

  lifecycle {
    create_before_destroy = true
  }
}


resource "aws_autoscaling_group" "worker" {
  name = "${aws_launch_template.foo.name}-asg-test3"

  min_size             = 1
  desired_capacity     = 1
  max_size             = 1
  min_elb_capacity     = 1
  wait_for_capacity_timeout = "5m"
  health_check_type    = "EC2"
  force_delete         = true
  vpc_zone_identifier  = module.vpc.private_subnets
  target_group_arns         = [aws_lb_target_group.main_tg.arn]

  launch_template {
    id      = aws_launch_template.foo.id
    version = "$Latest"
  }

  lifecycle {
    create_before_destroy = true
    prevent_destroy       = false
  }

}

I tried something like this inside the resource aws_autoscaling_group but it doesn't work:

  provisioner "local-exec" {
    when = create
    command = "timeout 2m terraform destroy -target=aws_autoscaling_group.worker -auto-approve -force"
  }
已提問 1 年前檢視次數 928 次
1 個回答
0

Option 1: You can add a timeout related to create and delete operations to a specific resource inside terraform like this. Add this to your autoscaling group and check.

 timeouts {
    create = "60m"
    delete = "2h"
  }

Option 2: if the option1 doesn't work, you could have a lambda triggered based on the lifecycle hook - termination event of the ASG to clean up the old Asg. The old Asg could be identified based on any of the ASG metadata (like Tags -name)

Check this for steps - https://docs.aws.amazon.com/autoscaling/ec2/userguide/tutorial-lifecycle-hook-lambda.html

已回答 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南