"Failure reason Image size 12704675783 is greater than supported size 10737418240" when creating serverless endpoint in SageMaker.

0

How to reproduce the error: We want to run Python Inference in SageMaker. Because our model is pre-trained out side the SageMaker and has some special logic, so we need to create customer image. We see the document https://docs.aws.amazon.com/sagemaker/latest/dg/prebuilt-containers-extend.html#prebuilt-containers-extend-tutorial We use the 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:1.11.0-gpu-py38-cu113-ubuntu20.04-sagemaker to be the base image. We wrote a dockerfile and use "docker build" to create a new image. Also, use "docker push" to push new image to Amazon ECR. We pushed it to 935877503070.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:testaisage Then, we follow the document: https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints-create.html Then, we went to SageMaker console https://us-east-1.console.aws.amazon.com/sagemaker/home?region=us-east-1#/models We created model. We input the "935877503070.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:testaisage" of our new image to "Location of inference code image". Then, we create Endpoint configuration. Then, we create Endpoint. But the Endpoint shows "Failure reason Image size 12704675783 is greater than supported size 10737418240".

2 Answers
1

As you're seeing in the error message, SageMaker Serverless Inference imposes a limit of 10GiB (10737418240 bytes) on your deployed container size - which helps deliver quality of service for considerations like cold-start time. From a quick look I didn't see this mentioned in the SageMaker serverless docs, but as mentioned in the launch blog post, SageMaker Serverless is backed by AWS Lambda and the AWS Lambda quotas page lists the limit.

So to solve the issue (and still use SageMaker Serverless Inference), you'll need to look at optimizing that container image size by removing any unnecessary bloat (need to find almost 2GiB from the number you posted).

Some suggestions on that:

  • Are you currently building your actual model in to the image itself? The typical pattern on SageMaker is to host a model.tar.gz tarball on S3, which gets downloaded and extracted into your container at runtime. For large language models and similar, this can be a big size saving (although of course, optimizing overall S3+image size can still help give you the best start-up times). The contents of this file are flexible so you could offload multiple artifacts.
  • I saw you're using the standard PyTorch DLC as a base... Are you replacing the entire serving stack, or slotting your custom logic into the one the DLC provides? The stack already provided in the PyTorch container already provides (see docs here) customization to model loading via model_fn, input de-serialization via input_fn, output serialization via output_fn, and actual prediction via predict_fn. The APIs between these user-defined functions are very flexible (for example can return pretty much whatever you like from model_fn, so long as predict_fn knows how to use it) - so I find in practice that it can support even complex requirements like custom request formats, pipelining multiple models together, advanced pre-processing, etc. I've seen some customers go straight to building custom serving stacks (and installing their dependencies alongside the existing e.g. TorchServe in the image) before realising that the pre-built could already support what they needed. Again, this inference.py script would live in your model.tar.gz.
  • General non-SageMaker-specific container image optimization guidelines would still apply: Like for e.g. you might see the AWS DLCs clearing apt caches in the same RUN command as performing apt installs. If you find yourself really struggling with the size of the base AWS DLC you could look in to building from scratch / another base, and installing everything you need... But of course, would need to do the due diligence to check you're including everything you need & it's optimized well.
AWS
EXPERT
Alex_T
answered 2 years ago
  • Thanks for your reply. Yes, what I want to do is just wirte our own "model_fn", "predict_fn" functions. You said these function should be in inference.py and in model.tar.gz. But I only found the document https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-inference-container.html which said these function should write to dock container. Is there any document about the structure of model.tar.gz file? and which file in model.tar.gz will be run? Thank you.

0

You need a smaller container image. Also, take into consideration that at the moment SageMaker serverless endpoints do not support GPU acceleration (see https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html#serverless-endpoints-how-it-works-exclusions).

profile pictureAWS
EXPERT
Tasio
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions