Salta al contenuto

How do I troubleshoot build pipeline timeout errors in Image Builder?

7 minuti di lettura
0

I want to troubleshoot build pipeline timeout errors that I receive in EC2 Image Builder.

Resolution

Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.

To troubleshoot build pipeline timeout errors, take the following actions based on when the timeout occurs. For information about how to monitor timeouts, see Monitor Image Builder logs with Amazon CloudWatch Logs.

The timeout occurs when the build verifies SSM Agent availability

If the timeout occurs when the build verifies the AWS Systems Manager Agent (SSM Agent), then you might receive one of the following error messages:

"Workflow Execution ID: failed with reason: ExpectationNotMet. ssm:*CommandInvocations returned terminal state Failed in workflow step LaunchBuildInstance."

"Workflow Execution ID: failed with reason: An error occurred (InvalidInstanceId) when calling the SendCommand operation: Instances [[i-1a1b1c1d1e1f1g1h1]] not in a valid state for account in workflow step LaunchBuildInstance."

"Workflow Execution ID: failed with reason: ExpectationNotMet. ec2:DescribeInstanceStatus did not meet terminal states: [['passed']] after 100 attempts. Reason: Timeout. in workflow step LaunchBuildInstance."

To troubleshoot these errors, take the following actions.

Validate that the Amazon EC2 instance has the required IAM permissions

Make sure that your Amazon Elastic Compute Cloud (Amazon EC2) instance has the required AWS Identity and Access Management (IAM) permissions. Attach the AmazonSSMManagedInstanceCore managed policy to the IAM role that you use for Image Builder. To identify the role that you use for Image Builder, check the infrastructure configuration details. Also, make sure that the AWSServiceRoleForImageBuilder role can use the AWS Key Management Service (AWS KMS) key that's specified on the image recipe blocking device.

Make sure that SSM Agent can reach the endpoints

Check the following settings based on your configuration:

  • If you use a public subnet with an internet gateway, then configure the subnet to automatically assign a public IPv4 address.
  • If you use a private subnet with a NAT gateway, then configure the NAT gateway to use a public subnet.
  • If you use a private subnet with Amazon Virtual Private Cloud (Amazon VPC) endpoints, then configure private endpoints for AWS Systems Manager.
  • Confirm that the security group and network access control lists (network ACLs) allow inbound connections on ephemeral ports 1024-65535, and outbound connections on port 443.
  • If you use a private subnet with AWS PrivateLink endpoints, then confirm that the Amazon VPC endpoint's security group allows inbound connections on port 443.
    Note: To allow the inbound connections, use the subnet or the Amazon VPC CIDR address.

For more troubleshooting steps, see Why is my image build pipeline failing with the error "Step timed out while step is verifying the Systems Manager Agent availability on the target instance(s)" in Image Builder?

The timeout occurs when the build downloads the AWS CLI

If the EC2 instance's Amazon Machine Image (AMI) doesn't have the AWS CLI, then the bootstrap script installs the AWS CLI over the internet. However, if you build on a private subnet that doesn't allow internet connectivity, then the build times out at the ApplyBuildComponents step. For a container build, the timeout occurs at the BootstrapBuildInstance step.

To resolve this timeout issue, allow internet connectivity on the subnet through a NAT gateway or internet gateway. Or, create a custom AMI that you install the AWS CLI onto.

The timeout occurs at the LaunchBuildInstance step

You must use unique root device names in your build instance. As a result, it's not a best practice to name your root device /dev/xvda or /dev/sda1. If you use a duplicate root device name when you use the CreateImageRecipe API, then you receive a timeout with the following error message:

"Workflow Execution ID: failed with reason: ExpectationNotMet. ec2:DescribeInstanceStatus did not meet terminal states: [['passed']] after 100 attempts. Reason: Timeout. in workflow step LaunchBuildInstance."

Note: If you use an instance that's built on the AWS Nitro System, or is a Xen instance type, then you can use a duplicate device name.

To check the device name of the source AMI, run the following describe-images AWS CLI command:

aws ec2 describe-images \
    --region exampleregion \
    --image-ids exampleami

Note: Replace exampleregion with your AWS Region and exampleami with the source AMI ID.

In the output, check the value for RootDeviceName. Make sure that you use the same device name in the Image Builder recipe.

The timeout occurs when the build gets the Image Builder components

If you build on a private subnet and Image Builder can't connect to download components, then you receive the following error message:

"failed with reason: failed to download the EC2 Image Builder Component, operation error imagebuilder: GetComponent, exceeded maximum number of attempts, 3, dial tcp i/o timeout."

To resolve the preceding error, make sure that your security groups and DNS resolution allow communication on the required ports. Or, create an interface Amazon VPC endpoint for the same VPC and subnet that you use in your Image Builder infrastructure configuration. Also, verify that your configuration adheres to the Image Builder requirements.

The timeout occurs when the build retrieves the mirrorlist

Amazon Linux stores the mirrorlist in an Amazon Simple Storage Service (Amazon S3) bucket. If you build on a private subnet that doesn't have access to Amazon S3, then you receive the following timeout error message:

"Could not retrieve mirrorlist; error was 12: Timeout was reached."

To resolve this issue, create an Amazon VPC gateway endpoint for Amazon S3. By default, Amazon VPC adds the Amazon S3 prefix list to the route table when you create an endpoint. However, it's a best practice to confirm that the prefix list is in the route table.

If you build on an AMI that isn't Amazon Linux, then the mirrorlist isn't stored on Amazon S3. In this scenario, a build timeout might occur when the build gets the repository/mirrorlist. Make sure that you allow the repository address or URL in your network firewall or proxy. If the repository/mirrorlist requires the internet, then allow internet connectivity on the subnet through a NAT gateway.

The timeout occurs at the ApplyBuildComponents step

If a build timeout occurs at the ApplyBuildComponents step, then you receive the following error message:

"Workflow Execution ID: failed with reason: ExpectationNotMet. ssm:ListCommandInvocations did not meet terminal states: [['Success']] after 1440 attempts. Reason: Timeout. in workflow step ApplyBuildComponents."

To troubleshoot this issue, analyze the logs that Image Builder sent to the infrastructure's Amazon S3 bucket. For more information, see Review workflow runtime logs on Troubleshoot pipeline builds.

Also, analyze the component logs of the instance that you use to build or test a new image. Before you check the logs, deactivate Terminate instance on failure. To access this feature, complete the following steps:

  1. Open the Image Builder console.
  2. Choose Edit infrastructure.
  3. Choose Edit, and then select Specify settings to troubleshoot issues with building your image.
  4. Choose Instance settings, and then clear Terminate instance on failure.

Note: The detailedoutput.json log file describes the reason that the component failed or timed out. The application.log file provides debug-level troubleshooting information.

Finally, check the timeoutSeconds parameter value in the YAML schema for your document. The default value is 7200. To increase the amount of time before Image Builder times out, update the value in your YAML document. A value of -1 is infinite.

AWS UFFICIALEAggiornata 2 mesi fa