OSError: [Errno 28] No space left on device when creating an endpoint on Amazon Sagemaker

0

Hi I'm trying to create an endpoint for my own pretrained model. (which is customized flan-t5 model. )

I already have model.tar.gz already uploaded on S3 bucket called 'errocorrection' and working on sagemaker notebook instance ml.g4dn.2xlarge with EBS volume 100GB.

(model.tar.gz is archived : inference.py, pytorch_model.bin, tokenizer.json, tokenizer_config.json, config.json, generation_config.json, special_tokens_map.json and it is 10GB)

role = sagemaker.get_execution_role()
bucket = 'errocorrection'
%%bash -s "$role" "$bucket"
ROLE=$1
BUCKET=$2
aws s3 cp model.tar.gz s3://$BUCKET/model.tar.gz
%%time

model_path = 's3://{}/model.tar.gz'.format(bucket)
endpoint_name = "endpoint-{}".format(int(time.time()))

model = PyTorchModel(model_data=model_path,
                     role=role,
                     entry_point='inference.py',
                     framework_version='1.8.0',
                     py_version='py3')

predictor = model.deploy(
    initial_instance_count=1,
    instance_type='ml.g4dn.2xlarge'
)

The problem is, when I try to create an endpoint for this model(with a code above), it fails because of Os error : [errno28] No space left on disk. I checked the space with terminal, df -h command, but I think there is already enough space.

I've attached the screenshot of trouble shooting and the result of memories when I checked the space. Please help me!!

troubleshooting

space

lea
asked 4 months ago96 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions