Why are my Amazon ECS container instances with Amazon Linux 1 AMIs disconnected?

5 minute read
0

My container instances for Amazon Elastic Container Service (Amazon ECS) are disconnected.

Short description

Note: Amazon Linux 1 reached its end of life on December 31, 2023. Amazon Linux AMI no longer receives security updates or bug fixes. For more information, see Update on Amazon Linux AMI end-of-life.

Your Amazon ECS container agent might connect and reconnect several times in an hour. These change events are normal and aren't a cause for concern. However, if your container agent remains in a disconnected state, then the container instance can't operate as part of your ECS cluster. When agentConnected returns false, then this return means that your agent is disconnected. The issue can be caused by the following factors:

  • Networking issues prevent communication between the instance and Amazon ECS.
  • The container agent doesn't have the required AWS Identity and Access Management (IAM) permissions to communicate with Amazon ECS endpoints.
  • There are problems with the host or Docker service inside the container instance.

To identify the cause of the disconnection, complete the following tasks.

Resolution

Note: The following resolution applies to Amazon ECS-optimized Amazon Linux 1 AMIs. For a resolution that applies to Amazon ECS-optimized Amazon Linux 2 AMIs, see How do I troubleshoot a disconnected Amazon ECS agent?

Verify that the Docker service is running on the container instance

Complete the following steps:

  1. To verify that the Docker service is running on the affected container instance, run the following command:

    sudo service docker status 

    The command output is similar to the following:

    docker (pid 23013) is running...

    If the Docker service isn't running, or to restart the service, then run the following command:

    sudo service docker restart

    Note: Don't enter this command while the service is already running. The container instance must be in the draining state. First, wait for the instance state to exit the draining state. Then, restart the Docker service for existing tasks to be scheduled on another container instance.
    The command output must include the following lines:

    Stopping docker: [  OK  ]Starting docker: [  OK  ]

    Note: To verify that the Docker service is running after the restart command, run the following command:

    sudo service docker status
  2. To start the ECS agent, run the following command:

    sudo start ecs

Verify that the container agent is running on the container instance

To verify that the container agent is running on the affected container instance, run the following command:

sudo status ecs

If the container agent isn't running on your container instance, then run the following command to start the agent:

sudo start ecs

The command output is similar to the following:

ecs start/running, process 23403

Review log files for the container agent and Docker

If your container instances are still disconnected, then review the log files on the container host for the container agent and Docker.

To output the log files for the container agent and Docker, run the following commands:

sudo cat /var/log/ecs/ecs-agent.log.YYYY-MM-DD-**
sudo cat /var/log/docker

Note: To collect log information from the container instance, run the Amazon ECS logs collector.

Verify that the IAM instance profile has the necessary permissions

If the container agent is still disconnected, then verify that the IAM instance profile associated with the container instance has the necessary IAM permissions.

To review the IAM permissions, complete the following steps:

  1. Use SSH to connect to the instance.

  2. To view the instance metadata on the instance profile associated with the instance, run the following command:

    curl http://169.254.169.254/latest/meta-data/iam/info

    The command output is similar to the following:

    {
      "Code" : "Success",
      "LastUpdated" : "2019-06-29T15:47:03Z",
      "InstanceProfileArn" : "arn:aws:iam::1122334455:instance-profile/ecsInstanceRole",
      "InstanceProfileId" : "AIPAJ5WF3LZVY7PLUHV72"
    }
  3. Verify that the IAM role contains the correct permissions for your container instances.

  4. To verify specific credential errors with the container agent, run the following command to check the container agent log for a list of ECS logs:

    cat /var/log/ecs/ecs-agent.log.YYYY-MM-DD-**

    Note: The container agent log is rotated every hour, and the suffix automatically changes to reflect the current date and time. Update the command to include the date range and log ID for when the issue occurred.
    If the container agent doesn't have the necessary credentials, then you receive an error similar to the following in the logs:

    2019-06-29T16:10:09Z [ERROR] Unable to register as a container instance with ECS: AccessDeniedException: User: arn:aws:sts::1122334455:assumed-role/ecsInstanceRole/i-0052b2e858b1891ef is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-east-1:1122334455:cluster/exampleCluster
        status code: 400, request id: 0b73e260-5088-4688-a425-6f35f1ef440f
    2019-06-29T16:10:09Z [ERROR] Error re-registering: AccessDeniedException: User: arn:aws:sts::1122334455:assumed-role/ecsInstanceRole/i-0052b2e858b1891ef is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-east-1:1122334455:cluster/exampleCluster
        status code: 400, request id: 0b73e260-5088-4688-a425-6f35f1ef440f

Additional troubleshooting steps

If you can't identify the issue with your ECS container instance from this resolution, then contact AWS. Use Amazon ECS logs collector to create an archive of your instance's logs. Then, contact AWS for technical support.

Related information

Amazon ECS troubleshooting

Amazon EC2 container instances for Amazon ECS

Amazon ECS container instance IAM role

Viewing Amazon ECS container agent logs

AWS OFFICIAL
AWS OFFICIALUpdated a day ago