Skip to content

How do I resolve network interface provision errors for Amazon ECS on Fargate?

5 minute read
0

I want to resolve elastic network interface provision errors for Amazon Elastic Container Service (Amazon ECS) on AWS Fargate.

Short description

The following reasons can cause Fargate to experience API issues with the underlying host:

  • Fargate tries to attach a network interface to the underlying infrastructure that the task runs on. You receive the following error message: "Timeout waiting for network interface provisioning to complete."
  • Your Fargate tasks can't launch because the network interface wasn't created during the task provisioning state. You receive the following error message: "Network interface provision complete error timeout wait for network interface provision."

Amazon Elastic Compute Cloud (Amazon EC2) asynchronously handles the provisioning of the ENI. The provisioning process takes time. Amazon ECS has a timeout in case there are long wait times or unreported failures. There are times when the ENI is provisioned, but the report reaches Amazon ECS after the failure timeout. In this case, Amazon ECS sees the reported task failure with an in-use ENI.

ENI attachment failures are sometimes not preventable. Possible causes include:

  • Amazon EC2 attachment and propagation delays
  • AWS Systems Manager (SSM) delays in activating the Amazon EC2 instance
  • The Fargate or Amazon ECS agent is unable to send acknowledgment back to the Amazon ECS control plane

Note: To determine whether the creation of the network interface caused the issue, manually create a test network interface in the same subnet as your Fargate task. You can also check the AWS Health Dashboard for API issues.

Resolution

The resolution depends on how your Fargate task is launched.

If the Fargate task is part of an Amazon ECS service, then the Amazon ECS service scheduler automatically tries to launch the task again.

If the task is triggered by Amazon EventBridge, then EventBridge doesn't validate that the action defined by the rule runs successfully. If the event is critical, then implement a mechanism that validates the action ran and retries if it didn't.

If you launch the task with the RunTask API, then the workflow is asynchronous. When the workflow starts successfully, a success code is returned. The task doesn't show that it's in a RUNNING state. Tasks that you manually launch with the RunTask API require that you manually retry the launch.

To automate retries with exponential backoff and retry logic, create an AWS Step Functions state machine. For more information, see Retrying after an error.

Create a Step Functions state machine

Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, confirm that you're using the most recent AWS CLI version.

To create a state machine that synchronously runs the ECS RunTask operation, complete the following steps:

  1. Open the AWS Step Functions console.

  2. Choose Create state machine.

  3. Choose Write your workflow in code.

  4. For Type, choose Standard. For more information about the different types of workflow, see Choosing workflow type in Step Functions.

  5. In the Definition section, enter the following code:

    {
      "Comment": "A description of my state machine",
      "StartAt": "ECS RunTask",
      "States": {
        "ECS RunTask": {
          "Type": "Task",
          "Resource": "arn:aws:states:::ecs:runTask.sync",
          "Parameters": {
            "LaunchType": "FARGATE",
            "Cluster": "<ClusterARN>",
            "TaskDefinition": "<TaskDefinitionARN>",
            "NetworkConfiguration": {
              "AwsvpcConfiguration": {
                "Subnets": [<Subnets>],
                "SecurityGroups": [<SecurityGroups>],
                "AssignPublicIp": "ENABLED" | "DISABLED"
              }
            }
          },
          "Next": "Notify Success",
          "Retry": [
            {
              "ErrorEquals": [
                "States.ALL"
              ],
              "BackoffRate": 2,
              "MaxAttempts": 3,
              "IntervalSeconds": 10
            }
          ],
          "Catch": [
            {
              "ErrorEquals": [
                "States.ALL"
              ],
              "Next": "TransformData"
            }
          ]
        },
        "TransformData": {
          "Type": "Pass",
          "Next": "Notify Failure",
          "Parameters": {
            "Error.$": "$.Error",
            "Cause.$": "States.StringToJson($.Cause)"
          }
        },
        "Notify Failure": {
          "Type": "Task",
          "Resource": "arn:aws:states:::sns:publish",
    

    Note: Replace ClusterARN with your cluster ARN. Replace TaskDefinitionARN with your task definition ARN. Replace Subnets with your subnet IDs. Replace SecurityGroups with your security group IDs. Replace Topic ARN with your Amazon SNS topic ARN.

  6. Choose Next.

  7. For Name, enter a name for your state machine.

  8. Select a role to run the state machine and related resources. Use a role with least privilege and include only the permissions necessary for your AWS Identity and Access Management (IAM) policies. The following example policies include only the necessary permissions:

    ECS policy:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "ecs:RunTask"
          ],
          "Resource": [
            "arn:aws:ecs:*:123456789:task-definition/<TASK_DEFINITION>"
          ],
          "Condition": {
            "ArnLike": {
              "ecs:cluster": "arn:aws:ecs:*:123456789:cluster/<ECS_CLUSTER>"
            }
          }
        },
        {
          "Effect": "Allow",
          "Action": "iam:PassRole",
          "Resource": [
            "*"
          ],
          "Condition": {
            "StringLike": {
              "iam:PassedToService": "ecs-tasks.amazonaws.com"
            }
          }
        },
        {
          "Effect": "Allow",
          "Action": [
            "ecs:StopTask",
            "ecs:DescribeTasks"
          ],
          "Resource": "*"
        },
        {
          "Effect": "Allow",
          "Action": [
            "events:PutTargets",
            "events:PutRule",
            "events:DescribeRule"
          ],
          "Resource": [
            "arn:aws:events:us-east-1:123456788:rule/StepFunctionsGetEventsForECSTaskRule"
          ]
        }
      ]
    }
    

    Note: Replace TASK_DEFINITION with your task definition name. Replace ECS_CLUSTER with your cluster name.

    Amazon SNS policy:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "sns:Publish"
          ],
          "Resource": [
            "arn:aws:sns:us-east-1:12345678:<TOPIC>"
          ]
        }
      ]
    }
    

    Note: Replace TOPIC with your Amazon Simple Notification Service (Amazon SNS) topic name.

    Amazon CloudWatch policy:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "logs:CreateLogDelivery",
            "logs:GetLogDelivery",
            "logs:UpdateLogDelivery",
            "logs:DeleteLogDelivery",
            "logs:ListLogDeliveries",
            "logs:PutResourcePolicy",
            "logs:DescribeResourcePolicies",
            "logs:DescribeLogGroups"
          ],
          "Resource": "*"
        }
      ]
    }
    
  9. To create the necessary CloudWatch log streams, choose a Log level.

  10. Choose Create state machine.

Integrate your state machine with Amazon EventBridge

To complete the following steps:

  1. Open the Amazon EventBridge console.
  2. In the navigation pane, choose Rules.
  3. Choose Create rule.
  4. Choose Schedule. To use an event-driven response, choose Event. For more information, see Creating Amazon EventBridge event patterns.
  5. Choose Add target.
  6. From the dropdown list, choose Step Functions state machine.
  7. Choose your state machine.
  8. Choose a role with the appropriate permissions to run the state machine.
  9. Choose Configure details and enter a name and description for your rule.
  10. Choose Create rule.

Related information

AwsVpcConfiguration

AWS OFFICIALUpdated 2 months ago