How do I troubleshoot connection issues between my Fargate task and other AWS services?

7 minute read
0

I want to troubleshoot connectivity issues I am having between my AWS Fargate task and an AWS service.

Short description

Applications that run inside a Fargate task with Amazon Elastic Container Service (Amazon ECS) can fail to access other AWS services due to the following reasons:

  • Insufficient AWS Identity and Access Management (IAM) permissions
  • Incorrect subnet routes
  • Network access control list (network ACL) restrictions
  • Security groups
  • Amazon Virtual Private Cloud (Amazon VPC) endpoints

To resolve these issues, use Amazon ECS Exec to interact with the application container of the Fargate task. If you observe connection timeout errors in the application container logs, then test the connectivity between the Fargate task and the corresponding AWS service.

Resolution

Use ECS Exec to interact with the application container of the Fargate task

1.    Before using Amazon ECS exec, complete the prerequisites of using Amazon ECS Exec.

2.    Follow the instructions in Using Amazon ECS Exec to turn on the feature.

3.    Run Amazon ECS Exec to access your application container and check the network and IAM connectivity between the container and AWS service.

Note: Before performing Exec, it's a best practice to set the parameter initProcessEnabled to true. This keeps AWS Systems Manager Agent (SSM Agent) child processes from becoming orphaned. (Optional) Add a sleep command for the application container to keep the container running for a specified time period.

Example:

{
    "taskRoleArn": "ecsTaskRole",
    "networkMode": "awsvpc",
    "requiresCompatibilities": [
        "EC2",
        "FARGATE"
        ],
        "executionRoleArn": "ecsTaskExecutionRole",
        "memory": ".5 gb",
        "cpu": ".25 vcpu",
        "containerDefinitions": [
            {
                "name": "application",
                "image": "application:latest",
                "essential": true,
                "command": ["sleep","7200"],
                "linuxParameters": {
                    "initProcessEnabled": true
                }
            }
        ],
        "family": "ecs-exec-task"
}

If you can't use Exec to access your application container, then run Exec for a new Fargate task that runs on the amazon/aws-cli Docker image. This lets you test the communication between the Fargate task and the AWS service.

Note: The new Fargate task must have the same networking setup (subnets, security groups, and so on) as your application container.

To run a new Fargate task with the amazon/aws-cli Docker image, complete the following steps:

Note: AWS Command Line Interface (AWS CLI) is preinstalled on the amazon/aws-cli image of your container. If AWS CLI isn't installed on your application container, then run the following command:

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86\_64.zip" -o "awscliv2.zip" 

unzip awscliv2.zip 

sudo ./aws/install

1.    Create a task definition with amazon/aws-cli as the image for the container. Then, add the entry points tail-f, and /dev/null to put the container in a continuous Running state.

Example task definition:

{  
    "requiresCompatibilities": \[  
        "FARGATE"  
    \],  
    "family": "aws-cli",  
    "containerDefinitions": \[  
        {  
        "entryPoint": \[  
        "tail",  
        "-f",  
        "/dev/null"  
        \],  
        "name": "cli",  
        "image": "amazon/aws-cli",  
        "essential": true  
        }  
    \],  
    "networkMode": "awsvpc",  
    "memory": "512",  
    "cpu": "256",  
    "executionRoleArn": "arn:aws:iam::123456789012:role/EcsTaskExecutionRole",  
    "taskRoleArn": "arn:aws:iam::123456789012:role/TaskRole"  
}

2.    Create an Amazon ECS service with the newly created task definition and with the same network configuration as the application container:

$ aws ecs create-service --cluster <example-cluster-name> --task-definition <example-task-definition-name> --network-configuration awsvpcConfiguration="{subnets=[example-subnet-XXXXXXX, example-subnet-XXXXXXX],securityGroups=[example-sg-XXXXXXXXXXXX],assignPublicIp=ENABLED}" --enable-execute-command --service-name <example-service-name> --desired-count 1 --launch-type FARGATE --region <example-region>

Note: Replace example-cluster-name with your cluster name, example-task-definition-name with your task definition name, example-service-name with your service name, and example-region with your AWS Region.

3.    Run Exec to access the Amazon ECS Fargate task container, and run the /bin/sh command against your specified container-name and task-id:

$ aws ecs execute-command --cluster <example-cluster-name> --task <example-task-id> --container <example-container-name> --interactive --command "/bin/sh" --region <example-region>

Note: Replace example-cluster-name with your cluster name, example-task-id with your task ID, example-container-name with your container name, and example-region with your Region.

If you still have issues using ECS Exec on your Fargate task, then see REFER TO ISHAN'S ARTICLE HERE (Awaiting URL for article_33538).

Test the connectivity between a Fargate task and the corresponding AWS service

Troubleshoot insufficient IAM permissions

Check whether the Fargate task has sufficient IAM permissions to connect to the corresponding AWS service. To run AWS CLI commands for the required AWS service, see the AWS CLI command Reference Guide.

Example connectivity test between the Fargate task and Amazon Simple Notification Service (Amazon SNS):

# aws sns list-topics --region <example-region-name>

If you receive the following error, then check the Amazon VPC endpoint policy. Make sure that the policy allows access to perform the necessary actions against the AWS service.

An error occurred (AuthorizationError) when calling the ListTopics operation: User: arn:aws:sts::123456789012:assumed-role/TaskRole/123456789012 is not authorized to perform: SNS:ListTopics on resource: arn:aws:sns:<region-name>:123456789012:* with an explicit deny in a VPC endpoint policy

If you receive the following error, then check the permissions of the Amazon ECS task IAM role. Make sure that the IAM role has the required permissions to perform the required actions on the AWS service.

An error occurred (AuthorizationError) when calling the ListTopics operation: User: arn:aws:sts::123456789012:assumed-role/TaskRole/123456789012 is not authorized to perform: SNS:ListTopics on resource: arn:aws:sns:<region-name>:123456789012:* because no identity-based policy allows the SNS:ListTopics action

Note: If you don't see any error when running AWS CLI commands on the Fargate task, then the required IAM permissions are present for that AWS service.

Troubleshoot connection timeout errors

1.    Use # telnet to test the network connectivity to your AWS service endpoints from the Fargate task:

# telnet <EXAMPLE-ENDPOINT> <EXAMPLE-PORT>

Note: Replace EXAMPLE-ENDPOINT with your AWS service endpoint name and URL and EXAMPLE-PORT with your AWS service port.

The following example output shows that the endpoint is accessible from the container:

Trying 10.0.1.169...
Connected to sns.us-east-1.amazonaws.com.
Escape character is '^]'.

# dig <EXAMPLE-ENDPOINT>

# nslookup <EXAMPLE-ENDPOINT>

For a list of Regional AWS service endpoints, see Service endpoints and quotas for AWS services.

Note: If you didn't install telnet and dig in the application container, then run the apt-get updateapt install dnsutils, and apt install telnet commands to install them. For containers based on amazon/aws-cli, use the yum updateyum install telnet, and yum install bind-utils commands to install telnet and other tools.

2.    If you receive Connection timed out errors after testing the network connectivity to your AWS service endpoints, then inspect the network configuration:

Run the nslookup command. If you see VPC CIDR IP ranges, then traffic is routing through VPC endpoints:

# nslookup sns.us-east-1.amazonaws.com

Non-authoritative answer:
Name:    sns.us-east-1.amazonaws.com
Address: 10.0.1.169
Name:    sns.us-east-1.amazonaws.com
Address: 10.0.2.248

For Connection timed out errors, check the inbound rules of the VPC endpoint security group. Make sure that TCP traffic over port 443 is allowed in the inbound rules from the ECS security group or VPC CIDR. For more information, see How can I troubleshoot connectivity issues over my gateway and interface VPC endpoints?

  • If no Amazon VPC endpoints are configured in the Region, then check the routes from your subnets to the internet. For a Fargate task in a public subnet, make sure that your task has a default route to the internet gateway. For a Fargate task in a private subnet, make sure that your task has a default route. Your task needs a default route to the NAT gateway, AWS PrivateLink, another source of internet connectivity, or to local and VPC CIDR.
  • Make sure that the network ACL allows access to the AWS service.
  • Check that the inbound rules of the security group are attached to the AWS service that you're trying to access with your Fargate task. Allow the ingress traffic over the required ports.
  • Check that the outbound rules of the Fargate task security group allows egress traffic over the required ports to connect to the AWS service.
AWS OFFICIAL
AWS OFFICIALUpdated a year ago