Hi there i am trying to start a task which uses gpu on my instance. EC2 is already added to a cluster
but it failed to start,
here is the error:
status: STOPPED (CannotStartContainerError: Error response from dae)
Details
Status reason CannotStartContainerError: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr
Network bindings - not configured
ec2: setup
Type: AWS::EC2::Instance
Properties:
IamInstanceProfile: !Ref InstanceProfile
ImageId: ami-0d5564ca7e0b414a9
InstanceType: g4dn.xlarge
KeyName: tmp-key
SubnetId: !Ref PrivateSubnetOne
SecurityGroupIds:
- !Ref ContainerSecurityGroup
UserData:
Fn::Base64:
!Sub |
#!/bin/bash
echo ECS_CLUSTER=traffic-data-cluster >> /etc/ecs/ecs.config
echo ECS_ENABLED_GPU_SUPPORT=true >> /etc/ecs/ecs.config
Dockerfile
FROM nvidia/cuda:11.6.0-base-ubuntu20.04
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
# RUN nvidia-smi
RUN echo 'install pip packages'
RUN apt-get update
RUN apt-get install python3.8 -y
RUN apt-get install python3-pip -y
RUN ln -s /usr/bin/python3 /usr/bin/python
RUN pip3 --version
RUN python --version
WORKDIR /
COPY deployment/video-blurring/requirements.txt /requirements.txt
RUN pip3 install --upgrade pip
RUN pip3 install --user -r /requirements.txt
## Set up the requisite environment variables that will be passed during the build stage
ARG SERVER_ID
ARG SERVERLESS_STAGE
ARG SERVERLESS_REGION
ENV SERVER_ID=$SERVER_ID
ENV SERVERLESS_STAGE=$SERVERLESS_STAGE
ENV SERVERLESS_REGION=$SERVERLESS_REGION
COPY config/env-vars .
## Sets up the entry point for running the bashrc which contains environment variable and
## trigger the python task handler
COPY script/*.sh /
RUN ["chmod", "+x", "./initialise_task.sh"]
## Copy the code to /var/runtime - following the AWS lambda convention
## Use ADD to preserve the underlying directory structure
ADD src /var/runtime/
ENTRYPOINT ./initialise_task.sh
did you figure out how to get around this ? (i'm finding the same issue, also using nvidia gpu )