Skip to content

model_fn called multiple times(1 per GPU) during deployment.

0

I'm trying to deploy a quantized llama 3.1 70b model, wrote a custom model_fn to read it from disk, but after several errors and investigation I noticed that the loading function model_fn is called multiple times, 1 per GPU(I tried 2 instance sizes and always the number of calls is the same as the number of GPUs on the instance), so the first load works successfully but after the first one out of memory error occurs(because the model is attempted to be loaded to an already occupied GPU), I tried adding flags to determine if the model is already loading(and avoided loading it again) without success. Is this a bug? I was expecting model_fn to run just 1 time to load the model during deploy.

asked 8 months ago101 views
1 Answer
0

Hello,

The model_fn function provided in entry script can be called by SageMaker once per instance when first deploying or updating a model endpoint. However, if you are observing multiple calls causing memory issues, please contact AWS SageMaker Premium Support with endpoint ARN, sample scripts and logs to further dive deep into the root cause.

I also came across few similar issues regarding model_fn function being called multiple times. https://github.com/aws/amazon-sagemaker-examples/issues/341 https://github.com/aws/amazon-sagemaker-examples/issues/1073

AWS
SUPPORT ENGINEER
answered 8 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.