[Help/ideas wanted] Serverless Inference: Optimize cold start time

1

We are using Sagemaker Serverless Inference, where the endpoint is wrapped with a Lambda that has a 30sec timeout (this timeout is not adjustable). Our cold start time of the model is quite above that (around 43sec). We load a model using Huggingface transformers and have a FLASK API for serving the model. The model size is around 1.75GB.

Are there any guides on how to improve cold start and model loading time? Could we compile the weights differently beforehand for faster loading?

Richard
preguntada hace 2 años1612 visualizaciones
2 Respuestas
0

instead of loading model object from a zip file in lambda session. you can load the model object to elastic-cache upfront and load it in lambda instance from elastic-cache. you might need to serialize and deserialize but I think it would still be faster.

respondido hace 2 años
0

Hi! Thanks for your answer. In theory, that'd be a good idea and could work. However, my other question in this forum then comes into play :D

https://repost.aws/questions/QU0JnCsfMHRrSUosWjOiOM9g/feature-request-serverless-inference-with-vpc-config

Serverless Inference currently does not support a VPC configuration. Redis clusters, however, need to be in a VPC.

Richard
respondido hace 2 años

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas