[Help/ideas wanted] Serverless Inference: Optimize cold start time

1

We are using Sagemaker Serverless Inference, where the endpoint is wrapped with a Lambda that has a 30sec timeout (this timeout is not adjustable). Our cold start time of the model is quite above that (around 43sec). We load a model using Huggingface transformers and have a FLASK API for serving the model. The model size is around 1.75GB.

Are there any guides on how to improve cold start and model loading time? Could we compile the weights differently beforehand for faster loading?

Richard
demandé il y a 3 ans1791 vues
2 réponses
0

instead of loading model object from a zip file in lambda session. you can load the model object to elastic-cache upfront and load it in lambda instance from elastic-cache. you might need to serialize and deserialize but I think it would still be faster.

répondu il y a 3 ans
0

Hi! Thanks for your answer. In theory, that'd be a good idea and could work. However, my other question in this forum then comes into play :D

https://repost.aws/questions/QU0JnCsfMHRrSUosWjOiOM9g/feature-request-serverless-inference-with-vpc-config

Serverless Inference currently does not support a VPC configuration. Redis clusters, however, need to be in a VPC.

Richard
répondu il y a 3 ans

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions