[Help/ideas wanted] Serverless Inference: Optimize cold start time

1

We are using Sagemaker Serverless Inference, where the endpoint is wrapped with a Lambda that has a 30sec timeout (this timeout is not adjustable). Our cold start time of the model is quite above that (around 43sec). We load a model using Huggingface transformers and have a FLASK API for serving the model. The model size is around 1.75GB.

Are there any guides on how to improve cold start and model loading time? Could we compile the weights differently beforehand for faster loading?

Richard
質問済み 2年前1612ビュー
2回答
0

instead of loading model object from a zip file in lambda session. you can load the model object to elastic-cache upfront and load it in lambda instance from elastic-cache. you might need to serialize and deserialize but I think it would still be faster.

回答済み 2年前
0

Hi! Thanks for your answer. In theory, that'd be a good idea and could work. However, my other question in this forum then comes into play :D

https://repost.aws/questions/QU0JnCsfMHRrSUosWjOiOM9g/feature-request-serverless-inference-with-vpc-config

Serverless Inference currently does not support a VPC configuration. Redis clusters, however, need to be in a VPC.

Richard
回答済み 2年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ