Why is my Amazon ECS or Amazon EC2 instance unable to join the cluster?

7 minute read
0

I can't register my Amazon Elastic Compute Cloud (Amazon EC2) instance with an Amazon Elastic Container Service (Amazon ECS) Cluster.

Short description

Your Amazon EC2 instance can't register with or join an Amazon ECS cluster because of one or more of the following reasons:

  • The ECS endpoint can't access the Domain Name System (DNS) hostname of the instance publicly.
  • Your public subnet configurations are incorrect.
  • Your private subnet configurations are incorrect.
  • Your VPC endpoints are incorrectly configured.
  • Your security groups don't allow network traffic.
  • The EC2 instance doesn't have the required AWS Identity and Access Management (IAM) permissions. Or, the ecs:RegisterContainerInstance API call is denied.
  • The instance user data for your ECS container is incorrectly configured.
  • The ECS agent is stopped or not running on the instance.
  • The launch configuration of the Auto Scaling group isn't correct (if your instance is part of an Auto Scaling group).
  • The Amazon Machine Image (AMI) that you use for your instance doesn't meet the prerequisites.

Resolution

Use the AWSSupport-TroubleshootECSContainerInstance AWS Systems Manager runbook to troubleshoot the common issues that are listed in the Short description section. If the runbook's output doesn't provide recommendations, then use the manual troubleshooting approaches the following Resolution sections.

Use the Systems Manager Automation runbook

Use the AWSSupport-TroubleshootECSContainerInstance runbook to troubleshoot the EC2 instance that fails to register with the ECS cluster. This automation checks for potential issues with the following resources:

  • The user data for the instance contains the correct cluster information.
  • The instance profile contains the required permissions.
  • The network is correctly configured

Note: Be sure to use the AWSSupport-TroubleshootECSContainerInstance runbook in the same AWS Region where your ECS Cluster and EC2 instance are located.

  1. Open the AWS Systems Manager console.
  2. In the navigation pane, under Change Management, choose Automation.
  3. Choose Execute automation.
  4. Choose the Owned by Amazon tab.
  5. Under Automation document, search for TroubleshootECSContainerInstance.
  6. Select the AWSSupport-TroubleshootECSContainerInstance card.
    Note: Select the radio button and not the hyperlinked automation name.
  7. Choose Next.
  8. For Execution automation document, choose Simple execution.
  9. In the Input parameters section, for AutomationAssumeRole, enter the Amazon Resource Name (ARN) of the role that allows Systems Manager Automation to perform actions.
    Note: If you don't specify an IAM role, then Systems Manager Automation uses the permissions of the IAM user or role that runs the runbook. For more information about creating the assume role for Systems Manager Automation, see Method 2: Use IAM to configure roles for Automation. Be sure that the AutomationAssumeRole or the IAM role has the following permissions: ec2:DescribeIamInstanceProfileAssociations, ec2:DescribeInstanceAttribute, ec2:DescribeInstances, ec2:DescribeNetworkAcls, ec2:DescribeRouteTables, ec2:DescribeSecurityGroups, ec2:DescribeSubnets, ec2:DescribeVpcs, ec2:DescribeVpcEndpoints, iam:GetInstanceProfile, iam:GetRole, iam:SimulateCustomPolicy, and iam:SimulatePrincipalPolicy.
  10. For ClusterName, enter the cluster name where the EC2 instance failed to register.
  11. For InstanceId, enter the EC2 Instance ID that failed to register.
  12. Choose Execute.

The runbook's output provides troubleshooting steps and recommendations.

Verify the status of the Amazon ECS agent on the Amazon Linux 2 instance

Run the following command to check whether the Amazon ECS container agent on the instance is running:

sudo systemctl status ecs

If the container agent isn't running on your instance, then run the following command to start the agent:

sudo systemctl start ecs

The output of the command output must look similar to the following:

ecs start/running, process 23403

Check launch configurations

If the instance that you're launching is part of an Auto Scaling group, then confirm that the Auto Scaling group's launch configuration is correct. For more information, see Step 5 in Refreshing an Amazon ECS container instance cluster with a new AMI.

Check the AMI of your instance

If the AMI that you use for the EC2 instance is a copied or custom AMI, then confirm that the instance has the following components:

The Amazon ECS optimized AMIs are preconfigured with these requirements. It's a best practice to use Amazon ECS optimized AMIs unless your application requires version that's not yet available in that AMI.

Verify the log files

If the issue still persists, then use Amazon ECS logs collector to collect the logs, and then review the logs to find the cause. You can also check log files on the container host for the container agent and Docker.

To view the log files for the container agent and Docker, run the following commands:

sudo cat /var/log/ecs/ecs-agent.log.YYYY-MM-DD-**
sudo cat /var/log/docker

Troubleshooting common errors

Error: Launching a new EC2 instance. Status Reason: This account is currently blocked and not recognized as a valid account. Please contact aws-verification@amazon.com if you have questions. Launching EC2 instance failed.

Contact aws-verification@amazon.com. Be sure to mention that you must unblock your account.

Error: re-registering: ClientException: Container instance 12345678910xxxxxxxxxxxx is inactive.\n\tstatus code: 400, request id: 012345678a-012345b-012ab-0a1-9f645f4s5c12" module=agent.go

You get this error when the ECS agent can't register the EC2 container instance with the ECS cluster because the EC2 instance is inactive now. This error is related to the application that's running on the instance. To understand the cause of the error, check the application. If the error persists, then check the ECS agent logs.

Error: Few instances are able to join the cluster but with the same configurations, other instance are not able to join the cluster.

This error might occur due to a ThrottlingException that results when a specific API call exceeds the rate limit. To resolve this error, increase the account-level rate limit. Check for APIs, such as RegisterTargets and RegisterContainerInstance.

Error: After changing the instance type, new instances are unable to join the cluster.

This error occurs when the ECS agent is stuck in Pending state and can't change the instance type. Unlike some EC2 instances, you can't stop the ECS instance, change the instance type, and then start it again. To change the instance type in Amazon ECS, complete the following steps:

  1. Terminate the container instance.
  2. Launch a new container instance with the new instance size. It's a best practice to launch the instance with Amazon ECS optimized Amazon Linux 2 AMI for your cluster.

Or, you can create a new launch configuration. Then, update the launch configuration in the Auto Scaling group.

For more information, see How do I change my container instance type in Amazon ECS?

Error: Unable to register as a container instance with ECS: AccessDeniedException: User: arn:aws:sts::1122334455:assumed-role/ecsInstanceRole/i-00aa11bb22cc33def is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-east-1:1122334455:cluster/exampleCluster . status code: 400, request id: 0a123456-7899-10101-a987-6543210deff

-or-
Error: 2019-06-29T16:10:09Z [ERROR] Error re-registering: AccessDeniedException: User: arn:aws:sts::1122334455:assumed-role/ecsInstanceRole/i-0052b2e858b1891ef is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-east-1:1122334455:cluster/exampleCluster status code: 400, request id: 0a123456-7899-10101-a987-123456pqrs

These errors occur due to missing IAM permissions. To resolve these errors, review the instructions in Amazon ECS container instance IAM role.

Also, run the AWSSupport-TroubleshootECSContainerInstance runbook to see which permissions are missing from the container instance role.

Related information

Create a virtual private cloud

Why are my Amazon ECS container instances with Amazon Linux 1 AMIs disconnected?

Amazon ECS troubleshooting

Creating your own runbooks