Deploying a 15GB model.tar.gz : "no space left on device"

0

Hi, I am trying to deploy a PyTorchModel to an endpoint. The model artifact (a LLM-vigogne), zipped, is 15GB. I've taken various steps, such as : I used instance_type=ml.r5.4xlarge and volume_size=200, and I am still getting a "No space left on device" error when the file is untarred. The endpoint never even appear in the aws console. Could you please assist in resolving this issue? Thanks in advance for any ideas to try out. Best regards, Alizée

2개 답변
0

I guess that you use the code to deploy SageMaker endpoint, but what confuses me is what does it mean "The endpoint never even appear in the aws console"? Do you have access to the CloudWatch logs?

profile picture
kraft
답변함 7달 전
  • Sorry for the imprecision. I mean in the endpoint section of AWS Sagemaker, in the AWS console (usually appears as "creating" when the endpoint is beoing deployed). As such I was not able to access the cloudwatch logs, as I usually use that to find the logs (there are heaps of various logs in cloudwatch and I do not know how to find mine).

0

So, you mean that no endpoint section show in AWS SageMaker Endpoint tab after you creating endpoint, right?
You can use the EventName: CreateEndpoint to search in cloudtrail event so that check some error when deploy endpoint.

profile picture
kraft
답변함 7달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠