How do I troubleshoot errors I receive when performing Amazon ECS Exec on my Fargate tasks?

5 minute read
0

I want to troubleshoot errors I receive when performing Amazon Elastic Container Service (Amazon ECS) Exec on my AWS Fargate tasks.

Short description

When using Amazon ECS Exec on Fargate tasks, you might receive the following error statements:

  • An error occurred (InvalidParameterException) when calling the ExecuteCommand operation: The execute command failed because execute command was not enabled when the task was run or the execute command agent isn’t running. Wait and try again or run a new task with execute command enabled and try again.
  • An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later.

To resolve these errors, troubleshoot InvalidParameter and TargetNotConnectedException.

Resolution

Note: It's a best practice to use AWS CloudShell. CloudShell comes preinstalled with the AWS Systems Manager Session Manager plugin and the AWS Command Line Interface (AWS CLI). If you receive errors when running AWS CLI commands, confirm that you're running a recent version of the AWS CLI.

Important: Replace all example strings in the AWS CLI commands with your values. For example, replace example-cluster-name with the name of your cluster.

Troubleshoot the InvalidParameterException error

If you don't turn on the ExecuteCommand option for your Fargate task, then you receive InvalidParameterException error. To resolve this issue, complete the following steps:

1.    Check whether the enableExecuteCommand parameter is set to true or false:

aws ecs describe-tasks --cluster <example-cluster-name> --tasks <example-task-id>| grep enableExecuteCommand

2.    If the enableExecuteCommand parameter is false, then update the parameter as true:

aws ecs update-service --cluster <example-cluster-name> --service <example-service> --region <example-region> --enable-execute-command --force-new-deployment

Note: The force-new-deployment option creates a new deployment that starts new tasks and stops old tasks based on the deployment configuration of the service. For more information, see Rolling updates.

Troubleshoot the TargetNotConnectedException error

To resolve a TargetNotConnectionException error, complete the following steps:

  • Check the required permissions and network configuration.
  • Use Amazon ECS Exec to get into the container with the correct shell.
  • Generate logs for Amazon ECS Exec to identify issues.

Check the required permissions and networking configuration

1.    Amazon ECS Exec requires a task IAM role to grant permissions for containers. Use the following policy to add the required SSM permissions for your task IAM role:

{
   "Version": "2012-10-17",
   "Statement": [
       {
       "Effect": "Allow",
       "Action": [
            "ssmmessages:CreateControlChannel",
            "ssmmessages:CreateDataChannel",
            "ssmmessages:OpenControlChannel",
            "ssmmessages:OpenDataChannel"
       ],
      "Resource": "*"
      }
   ]
}

For more information, see Task IAM role.

2.    If you're using interface Amazon Virtual Private Cloud (Amazon VPC) endpoints with Amazon ECS, then create the following endpoints for Systems Manager Session Manager:

  • ec2messages.region.amazonaws.com
  • ssm.region.amazonaws.com
  • ssmmessages.region.amazonaws.com

For more information, see Step 6: (Optional) Use AWS PrivateLink to set up a VPC endpoint for Session Manager.

3.    Run the check-ecs-exec.sh script to confirm that your AWS CLI environment and Amazon ECS cluster or task are ready for Amazon ECS Exec. Make sure that you meet the prerequisites. For more information, see Amazon ECS Exec Checker on the GitHub website.

Note: After running the check-ecs-exec.sh script, the output indicates what you must resolve before using ECS Exec.

Example output:

Prerequisites for check-ecs-exec.sh v0.7
-------------------------------------------------------------
  jq      | OK (/usr/bin/jq)
  AWS CLI | OK (/usr/local/bin/aws)

-------------------------------------------------------------
Prerequisites for the AWS CLI to use ECS Exec
-------------------------------------------------------------
  AWS CLI Version        | OK (aws-cli/2.11.0 Python/3.11.2 Linux/4.14.255-291-231.527.amzn2.x86_64 exec-env/CloudShell exe/x86_64.amzn.2 prompt/off)
  Session Manager Plugin | OK (1.2.398.0)

-------------------------------------------------------------
Checks on ECS task and other resources
-------------------------------------------------------------
Region : us-east-1
Cluster: Fargate-Testing
Task   : ca27e41ea3f54fd1804ca00feffa178d
-------------------------------------------------------------
  Cluster Configuration  | Audit Logging Not Configured
  Can I ExecuteCommand?  | arn:aws:iam::12345678:role/Admin
     ecs:ExecuteCommand: allowed
     ssm:StartSession denied?: allowed
  Task Status            | RUNNING
  Launch Type            | Fargate
  Platform Version       | 1.4.0
  Exec Enabled for Task  | NO
  Container-Level Checks | 
    ----------
      Managed Agent Status - SKIPPED
    ----------
    ----------
      Init Process Enabled (Exec-check:2)
    ----------
         1. Disabled - "nginx"
    ----------
      Read-Only Root Filesystem (Exec-check:2)
    ----------
         1. Disabled - "nginx"
  Task Role Permissions  | arn:aws:iam::12345678:role/L3-session
     ssmmessages:CreateControlChannel: implicitDeny
     ssmmessages:CreateDataChannel: implicitDeny
     ssmmessages:OpenControlChannel: implicitDeny
     ssmmessages:OpenDataChannel: implicitDeny
  VPC Endpoints          | SKIPPED (vpc-abcd - No additional VPC endpoints required)
  Environment Variables  | (Exec-check:2)
       1. container "nginx"
       - AWS_ACCESS_KEY: not defined
       - AWS_ACCESS_KEY_ID: not defined
       - AWS_SECRET_ACCESS_KEY: not defined

The preceding output indicates that ECS Exec isn't turned on for the task and that the task role doesn't have the required SSM permissions.

4.    Check if you configured IAM user credentials at the container level, such as specifying an access key or secret access key. If you configured IAM user credentials at the container level, then this overrides the permissions at the task level and causes an error.

Use Amazon ECS Exec to get into the container with the correct shell

Different base images can have different shells within them, and using incorrect shells results in an error. Make sure that you're using your correct shell as per your application image.

Run the following command to use ECS Exec to get into the container. Replace example_shell with your shell:

aws ecs execute-command --region <example-region> --cluster <example-cluster> --container <example-container> --task <example-task> --command "<example_shell>" --interactive

Generate logs for Amazon ECS Exec to identify issues

Generate SSM Agent logs to determine why ECS Exec isn't working within your Fargate task. Run the following command in the environment section of the container definition:

Console:

bin/bash,-c,sleep 2m && cat /var/log/amazon/ssm/amazon-ssm-agent.log

JSON:

"/bin/bash","-c","sleep 2m && cat /var/log/amazon/ssm/amazon-ssm-agent.log"

If you're using awslogs log driver, then the preceding commands generate SSM Agent logs, and transfer them to the Amazon CloudWatch log group. If you're using other log drivers or logging endpoints, then the SSM Agent logs transfer to those locations.

Example using JSON:

"entryPoint": [],
      "portMappings": [],
      "command": [
        "bin/bash",
        "-c",
        "sleep 2m && cat /var/log/amazon/ssm/amazon-ssm-agent.log"
      ],

Note: Different applications have different shells and editors. Make sure to review and modify command parameters as per your application.

Related information

Using ECS Exec

AWS OFFICIAL
AWS OFFICIALUpdated a year ago