[Help/ideas wanted] Serverless Inference: Optimize cold start time

1

We are using Sagemaker Serverless Inference, where the endpoint is wrapped with a Lambda that has a 30sec timeout (this timeout is not adjustable). Our cold start time of the model is quite above that (around 43sec). We load a model using Huggingface transformers and have a FLASK API for serving the model. The model size is around 1.75GB.

Are there any guides on how to improve cold start and model loading time? Could we compile the weights differently beforehand for faster loading?

Richard
asked 2 years ago1595 views
2 Answers
0

instead of loading model object from a zip file in lambda session. you can load the model object to elastic-cache upfront and load it in lambda instance from elastic-cache. you might need to serialize and deserialize but I think it would still be faster.

answered 2 years ago
0

Hi! Thanks for your answer. In theory, that'd be a good idea and could work. However, my other question in this forum then comes into play :D

https://repost.aws/questions/QU0JnCsfMHRrSUosWjOiOM9g/feature-request-serverless-inference-with-vpc-config

Serverless Inference currently does not support a VPC configuration. Redis clusters, however, need to be in a VPC.

Richard
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions