Fail to start an EC2 task on ECS

0

Hi there i am trying to start a task which uses gpu on my instance. EC2 is already added to a cluster

but it failed to start,

here is the error:

status: STOPPED (CannotStartContainerError: Error response from dae)

Details
Status reason	CannotStartContainerError: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr
Network bindings - not configured

ec2: setup

Type: AWS::EC2::Instance
    Properties: 
      IamInstanceProfile: !Ref InstanceProfile
      ImageId: ami-0d5564ca7e0b414a9
      InstanceType: g4dn.xlarge
      KeyName: tmp-key
      SubnetId: !Ref PrivateSubnetOne
      SecurityGroupIds: 
       - !Ref ContainerSecurityGroup
      UserData: 
        Fn::Base64:
          !Sub |
          #!/bin/bash
          echo ECS_CLUSTER=traffic-data-cluster >> /etc/ecs/ecs.config
          echo ECS_ENABLED_GPU_SUPPORT=true >> /etc/ecs/ecs.config

Dockerfile

FROM nvidia/cuda:11.6.0-base-ubuntu20.04
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility

# RUN nvidia-smi
RUN echo 'install pip packages'
RUN apt-get update
RUN apt-get install python3.8 -y
RUN apt-get install python3-pip -y
RUN ln -s /usr/bin/python3 /usr/bin/python

RUN pip3 --version
RUN python --version

WORKDIR /

COPY deployment/video-blurring/requirements.txt /requirements.txt
RUN pip3 install --upgrade pip
RUN pip3 install --user -r /requirements.txt

## Set up the requisite environment variables that will be passed during the build stage
ARG SERVER_ID
ARG SERVERLESS_STAGE
ARG SERVERLESS_REGION

ENV SERVER_ID=$SERVER_ID
ENV SERVERLESS_STAGE=$SERVERLESS_STAGE
ENV SERVERLESS_REGION=$SERVERLESS_REGION

COPY config/env-vars .


## Sets up the entry point for running the bashrc which contains environment variable and
## trigger the python task handler
COPY script/*.sh /
RUN ["chmod", "+x", "./initialise_task.sh"]

## Copy the code to /var/runtime - following the AWS lambda convention
## Use ADD to preserve the underlying directory structure
ADD src /var/runtime/

ENTRYPOINT ./initialise_task.sh
  • did you figure out how to get around this ? (i'm finding the same issue, also using nvidia gpu )

gefragt vor 2 Jahren606 Aufrufe
1 Antwort
0

adding the following environment variable to be passed into the docker in my "environment variables configuration" ECS fixed it:

NVIDIA_DISABLE_REQUIRE=1

Hope it works for you as well.

clogwog
beantwortet vor 2 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen