what is the model(transformer) size limitation in sagemaker serverless endpoint deployment?
is there a limitation on the size of the model that we can create a model and then eventually serverless endpoint? any documentation? I did some research and ran into something similar here. https://discuss.huggingface.co/t/sagemaker-serverless-inference-for-layoutlmv2-model/14186/3 i as a solution , it is advised to set MMS_DEFAULT_WORKERS_PER_MODEL=1. I'm not sure what exactly does this do? is there any aws documentation around this?
Hello,
There is not a hard limit on the size of a model used on a serverless endpoint as such, but there certain limits that you will reach in practice.
1st a serverless endpoint can have a limited amount of RAM memory (6GB) so even with MMS_DEFAULT_WORKERS_PER_MODEL=1, if your model requires more RAM to make a prediction then you will encounter errors.
2nd is that when you deploy the endpoint it needs to be up+running within 180s (3mins). Part of this time includes loading your model into memory. The larger the model the longer it will take to load into the memory. Unfortunately, I don't have any specific benchmarks on how long this might be against size of model.
3rd you need to consider the cold start issue of your endpoint. This is connected to 2nd point but wanted to mention separately as well. Part of the greatness of serverless endpoints is that when not needed they are completely shut down (cold) and upon request they get spun up and make a prediction. The larger the model, the longer it will take for it be loaded into memory and the longer your cold start time will be.
Relevant questions
How to grant permission to create a log file in sagemaker?
asked 3 months agoSageMaker Model Registry - how to set the Stage column of a Model Package?
asked 5 months agowhat is the model(transformer) size limitation in sagemaker serverless endpoint deployment?
asked 3 months agoHow to create a serverless endpoint configuration?
Accepted Answerasked 3 months agoFeature names Mapping in Sagemaker
asked 14 days agoHow to check/determine image/container size for aws managed images ?
asked 3 months agoHow to configure a serverless sagemaker endpoint?
asked 3 months ago[Feature Request] Serverless Inference with VPC Config
asked 4 months agoAuto rollback with Guardrails if model accuracy is not good
asked 4 months ago[Help/ideas wanted] Serverless Inference: Optimize cold start time
asked 4 months ago