AWS Fargate - ECS run task encountered network interface issue

0

After I had created a python job using Lambda, it processed only a small portion not the entire dataset. After making all sorts of troubleshooting, I come to a conclusion that I need an automate serverless service that provides me sufficient storage and computing capacity (Fargate). On switching to Fargate i come across this issue as below:

"Task stopped at: 2024-07-28T12:03:36.551Z Essential container in task exited" "There was an error while describing network interfaces. The networkInterface ID 'eni-0b13ee03ada704935' does not exist".

Found no log streams in Cloudwatch though the log is set up correctly. Therefore, not sure what are the root causes.

I have checked everything seems to work fine, from my Docker image to Task definition, role & permission and VPC network configuration, but It just does not work.

Appreciate any help!

Thanks a lot.

3 Answers
2
Accepted Answer

Hi,

You may want to consider AWS Batch instead of a Fargate container to execute your jobs.

see doc at https://aws.amazon.com/batch/

As its name implies, it was expressely built to execute one-off autonomous jobs. Fargate is more specific to long-lasting server tasks answering to multiple clients.

BTW, AWS Batch is built on top of Fargate but is much simpler to implement for jobs: that is the option that I use (heavily !!) for my long jobs.

Some good starting points:

  1. https://docs.aws.amazon.com/batch/latest/userguide/get-set-up-for-aws-batch.html
  2. https://stackify.com/aws-batch-guide/
  3. https://www.youtube.com/watch?v=k7r6i3x5d7Q

Best,

Didier

profile pictureAWS
EXPERT
answered 9 months ago
profile picture
EXPERT
reviewed 9 months ago
  • Thanks for your suggestion, I shifted to Batch and managed to run the job with "succeeded" status and exit code "0" but the application does not seem to run properly as the expected result. My script runs well locally though :/

  • Hi, can you elaborate (error messages, etc.) on the issue that your script faces? For such script, my recommendation is to add lots of CloudWatch logs to understand what happens at all stages of your code. If you're using Python, packages like loguru (https://github.com/Delgan/loguru) help in easily writing such logs

  • Hi, the error turns out to be the credentials stored in secrets manager. Not sure what's wrong... I have configured the permission to retrieve credentials from SM correctly and gave it to execution role (batch).

    Traceback (most recent call last): 2024-08-07T17:12:16.479+02:00 File "/var/task/bundesliga_update_ecs.py", line 201, in <module> 2024-08-07T17:12:16.479+02:00 main() 2024-08-07T17:12:16.479+02:00 File "/var/task/bundesliga_update_ecs.py", line 156, in main 2024-08-07T17:12:16.479+02:00 secret = get_secret("GOOGLE_APPLICATION_CREDENTIALS") 2024-08-07T17:12:16.479+02:00 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-08-07T17:12:16.479+02:00 File "/var/task/bundesliga_update_ecs.py", line 17, in get_secret 2024-08-07T17:12:16.479+02:00 get_secret_value_response = client.get_secret_value(SecretId=secret_name) 2024-08-07T17:12:16.479+02:00 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-08-07T17:12:16.479+02:00 File "/usr/local/lib/python3.12/site-packages/botocore/client.py", line 565, in _api_call 2024-08-07T17:12:16.479+02:00 return self._make_api_call(operation_name, kwargs) 2024-08-07T17:12:16.479+02:00 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-08-07T17:12:16.479+02:00 File "/usr/local/lib/python3.12/site-packages/botocore/client.py", line 999, in _make_api_call 2024-08-07T17:12:16.480+02:00 http, parsed_response = self._make_request( 2024-08-07T17:12:16.480+02:00 ^^^^^^^^^^^^^^^^^^^ 2024-08-07T17:12:16.480+02:00 File "/usr/local/lib/python3.12/site-pa

  • The error is due to wrong syntax to retrieve the secrets (string instead of json fornat)...

1

The error message you encountered, indicates a problem with the Elastic Network Interface (ENI) associated with your ECS task on Fargate. Here are some steps to troubleshoot:

Task Definition:

networkMode should be awsvpc. Subnets and Security Groups:

Verify configurations in the task definition. IAM Role:

Ensure task execution role has required permissions. Service Limits:

Check EC2 and VPC limits. AWS VPC Flow Logs:

Enable VPC Flow Logs to capture IP traffic information. Example IAM Policy for Task Execution Role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogStream",
        "logs:PutLogEvents",
        "logs:CreateLogGroup",
        "ecs:DescribeTasks",
        "ecs:DescribeTaskDefinition",
        "ec2:DescribeNetworkInterfaces",
        "ec2:CreateNetworkInterface",
        "ec2:DeleteNetworkInterface"
      ],
      "Resource": "*"
    }
  ]
}

profile picture
answered 9 months ago
  • Thanks for the guide. However, as I follow your instruction, the same error still persists.

0

Solved! I found out the answer to the above issue with Fargate which is that you have to set up correctly service role and execution role (cloudwatch logs) so that you know which particular errors ocurred. Then whether you run it via ECS Fargate or Batch would lead to the same outcome.

answered 8 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions