Fail to start an EC2 task on ECS

0

Hi there i am trying to start a task which uses gpu on my instance. EC2 is already added to a cluster

but it failed to start,

here is the error:

status: STOPPED (CannotStartContainerError: Error response from dae)

Details
Status reason	CannotStartContainerError: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr
Network bindings - not configured

ec2: setup

Type: AWS::EC2::Instance
    Properties: 
      IamInstanceProfile: !Ref InstanceProfile
      ImageId: ami-0d5564ca7e0b414a9
      InstanceType: g4dn.xlarge
      KeyName: tmp-key
      SubnetId: !Ref PrivateSubnetOne
      SecurityGroupIds: 
       - !Ref ContainerSecurityGroup
      UserData: 
        Fn::Base64:
          !Sub |
          #!/bin/bash
          echo ECS_CLUSTER=traffic-data-cluster >> /etc/ecs/ecs.config
          echo ECS_ENABLED_GPU_SUPPORT=true >> /etc/ecs/ecs.config

Dockerfile

FROM nvidia/cuda:11.6.0-base-ubuntu20.04
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility

# RUN nvidia-smi
RUN echo 'install pip packages'
RUN apt-get update
RUN apt-get install python3.8 -y
RUN apt-get install python3-pip -y
RUN ln -s /usr/bin/python3 /usr/bin/python

RUN pip3 --version
RUN python --version

WORKDIR /

COPY deployment/video-blurring/requirements.txt /requirements.txt
RUN pip3 install --upgrade pip
RUN pip3 install --user -r /requirements.txt

## Set up the requisite environment variables that will be passed during the build stage
ARG SERVER_ID
ARG SERVERLESS_STAGE
ARG SERVERLESS_REGION

ENV SERVER_ID=$SERVER_ID
ENV SERVERLESS_STAGE=$SERVERLESS_STAGE
ENV SERVERLESS_REGION=$SERVERLESS_REGION

COPY config/env-vars .


## Sets up the entry point for running the bashrc which contains environment variable and
## trigger the python task handler
COPY script/*.sh /
RUN ["chmod", "+x", "./initialise_task.sh"]

## Copy the code to /var/runtime - following the AWS lambda convention
## Use ADD to preserve the underlying directory structure
ADD src /var/runtime/

ENTRYPOINT ./initialise_task.sh
  • did you figure out how to get around this ? (i'm finding the same issue, also using nvidia gpu )

asked 2 years ago574 views
1 Answer
0

adding the following environment variable to be passed into the docker in my "environment variables configuration" ECS fixed it:

NVIDIA_DISABLE_REQUIRE=1

Hope it works for you as well.

clogwog
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions