[Help/ideas wanted] Serverless Inference: Optimize cold start time

1

We are using Sagemaker Serverless Inference, where the endpoint is wrapped with a Lambda that has a 30sec timeout (this timeout is not adjustable). Our cold start time of the model is quite above that (around 43sec). We load a model using Huggingface transformers and have a FLASK API for serving the model. The model size is around 1.75GB.

Are there any guides on how to improve cold start and model loading time? Could we compile the weights differently beforehand for faster loading?

Richard
posta 2 anni fa1612 visualizzazioni
2 Risposte
0

instead of loading model object from a zip file in lambda session. you can load the model object to elastic-cache upfront and load it in lambda instance from elastic-cache. you might need to serialize and deserialize but I think it would still be faster.

con risposta 2 anni fa
0

Hi! Thanks for your answer. In theory, that'd be a good idea and could work. However, my other question in this forum then comes into play :D

https://repost.aws/questions/QU0JnCsfMHRrSUosWjOiOM9g/feature-request-serverless-inference-with-vpc-config

Serverless Inference currently does not support a VPC configuration. Redis clusters, however, need to be in a VPC.

Richard
con risposta 2 anni fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande