How do I resolve network interface provision errors for Amazon ECS on Fargate?

4 minute read
0

I want to resolve elastic network interface provision errors for Amazon Elastic Container Service (Amazon ECS) on AWS Fargate.

Short description

The following reasons can cause Fargate to experience API issues with the underlying host:

  • Fargate tries to attach a network interface to the underlying infrastructure that the task runs on. You receive the following error message: "Timeout waiting for network interface provisioning to complete."
  • Your Fargate tasks can't launch because you didn't create the network interface during the task provisioning state. You receive the following error message: "Network interface provision complete error timeout wait for network interface provision."

Note: To determine whether the creation of the network interface caused the issue, manually create a test network interface in the same subnet as your Fargate task. You can also check the AWS Health Dashboard for API issues.

Resolution

If the Fargate task is part of an ECS service, then the ECS service scheduler tries to automatically launch the task again.

When you use the RunTask API to launch a task, the workflow is asynchronous. When the workflow successfully starts, a success code is returned. The task doesn't show that it's in a RUNNING state. Tasks that you manually launch with the RunTask API require that you manually retry the launch.

To automate retries with an exponential backoff and retry logic, create an AWS Step Functions state machine. For more information, see Retrying after an error.

Create a Step Functions state machine

To create a state machine that synchronously runs the ECS RunTask operation, complete the following steps:

  1. Open the Step Functions console.

  2. Choose Create state machine.

  3. Choose Write your workflow in code.

  4. For Type, choose Standard. For more information about the different types of workflow, see Standard vs. Express Workflows.

  5. In the Definition section, enter the following code:

    {
      "Comment": "A description of my state machine",
      "StartAt": "ECS RunTask",
      "States": {
        "ECS RunTask": {
          "Type": "Task",
          "Resource": "arn:aws:states:::ecs:runTask.sync",
          "Parameters": {
            "LaunchType": "FARGATE",
            "Cluster": "<ClusterARN>",
            "TaskDefinition": "<TaskDefinitionARN>",
            "NetworkConfiguration": {
              "AwsvpcConfiguration": {
                "Subnets": [<Subnets>],
                "SecurityGroups": [<SecurityGroups>],
                "AssignPublicIp": "ENABLED" | "DISABLED"
              }
            }
          },
          "Next": "Notify Success",
          "Retry": [
            {
              "ErrorEquals": [
                "States.ALL"
              ],
              "BackoffRate": 2,
              "MaxAttempts": 3,
              "IntervalSeconds": 10
            }
          ],
          "Catch": [
            {
              "ErrorEquals": [
                "States.ALL"
              ],
              "Next": "TransformData"
            }
          ]
        },
        "TransformData": {
          "Type": "Pass",
          "Next": "Notify Failure",
          "Parameters": {
            "Error.$": "$.Error",
            "Cause.$": "States.StringToJson($.Cause)"
          }
        },
        "Notify Failure": {
          "Type": "Task",
          "Resource": "arn:aws:states:::sns:publish",
          "Parameters": {
            "TopicArn": "<Topic ARN>",
            "Message": {
              "Error.$": "$.Error",
              "StoppedReason.$": "$.Cause.StoppedReason"
            }
          },
          "End": true
        },
        "Notify Success": {
          "Type": "Task",
          "Resource": "arn:aws:states:::sns:publish",
          "Parameters": {
            "TopicArn": "<Topic ARN>",
            "Message": "AWS ECS Task started by Step Functions succeeded"
          },
          "End": true
        }
      }
    }
  6. Choose Next.

  7. For Name, enter a name for your state machine.

  8. To run the state machine and related resources, select a role. It's best practice to select a role with the least privilege. Also, include only the permissions that are necessary for your IAM policies. The following example policies include only the necessary permissions:
    ECS policy:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "ecs:RunTask"
          ],
          "Resource": [
            "arn:aws:ecs:*:123456789:task-definition/<TASK_DEFINITION>"
          ],
          "Condition": {
            "ArnLike": {
              "ecs:cluster": "arn:aws:ecs:*:123456789:cluster/<ECS CLUSTER>"
            }
          }
        },
        {
          "Effect": "Allow",
          "Action": "iam:PassRole",
          "Resource": [
            "*"
          ],
          "Condition": {
            "StringLike": {
              "iam:PassedToService": "ecs-tasks.amazonaws.com"
            }
          }
        },
        {
          "Effect": "Allow",
          "Action": [
            "ecs:StopTask",
            "ecs:DescribeTasks"
          ],
          "Resource": "*"
        },
        {
          "Effect": "Allow",
          "Action": [
            "events:PutTargets",
            "events:PutRule",
            "events:DescribeRule"
          ],
          "Resource": [
            "arn:aws:events:us-east-1:123456788:rule/StepFunctionsGetEventsForECSTaskRule"
          ]
        }
      ]
    }

    SNS policy:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "sns:Publish"
          ],
          "Resource": [
            "arn:aws:sns:us-east-1:12345678:<TOPIC>"
          ]
        }
      ]
    }

    Amazon CloudWatch policy:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "logs:CreateLogDelivery",
            "logs:GetLogDelivery",
            "logs:UpdateLogDelivery",
            "logs:DeleteLogDelivery",
            "logs:ListLogDeliveries",
            "logs:PutResourcePolicy",
            "logs:DescribeResourcePolicies",
            "logs:DescribeLogGroups"
          ],
          "Resource": "*"
        }
      ]
    }
  9. To create the necessary CloudWatch log streams, choose a Log level.

  10. Choose Create state machine.

Integrate your state machine with CloudWatch

Complete the following steps:

  1. Open the Amazon EventBridge console.
  2. In the navigation pane, choose Events, and then choose Rules.
  3. Choose Create rule.
  4. Choose Schedule. To use an event-driven response, choose Event. To learn more, see Amazon EventBridge event patterns.
  5. Choose Add target.
  6. From the dropdown list, choose Step Function state machine.
  7. Choose your state machine.
  8. Choose a role with the appropriate permissions to run the state machine.
  9. For your rule, choose Configure details and provide a name and description.
  10. Choose Create Rule.

Related information

AwsVpcConfiguration

AWS OFFICIAL
AWS OFFICIALUpdated 9 months ago