Setup custom inference serverless endpoint on AWS Sagemaker


I'm using a HuggingFace model locally to return a vector instead of the normal pipeline that returns a zero-image classification response and I'm trying to get this to work on Sagemaker Serverless.

from PIL import Image
from transformers import CLIPProcessor, CLIPModel
import numpy as np

model = CLIPModel.from_pretrained("patrickjohncyh/fashion-clip")
processor = CLIPProcessor.from_pretrained("patrickjohncyh/fashion-clip")

image =
inputs = processor(images=image, return_tensors="pt", padding=True)
image_vector = model.get_image_features(**inputs).squeeze().detach().numpy()
return image_vector

I've read a lot about how to get things working with a normal HuggingFace model but I'm really struggling to find out how to do this using CLIPProcessor and CLIPModel with a serverless endpoint on Amazon Sagemaker. I hope someone here can help me! Thanks!

  • Hello Messi, could you add some more explanation why this is being challenging, e.g. error message you are getting, or specific gap with API specification?

asked 6 months ago210 views
1 Answer

One key benefit of SageMaker Serverless over DIY in Lambda is that the model packaging & deployment experience should be pretty much the same regardless of whether you ultimately deploy to serverless, real-time, async, or batch inference... Subject to a few extra constraints for each deployment type: Notably, serverless does not currently support GPU acceleration so your inferences might run slow for deep learning models - and will time out if they take longer than 60sec!

...So if you come across any relevant tutorials for real-time endpoints or other deployment types on SageMaker for your particular models, swapping them to serverless may be just a minor change in the deploy(...) call.

Without specifics on where you're struggling it's difficult to be exact, but my starting suggestion would be:

  • Create a folder structure [as described here] like model/code/
    • You don't need to include local copies of your source model & processor if you'd rather load them from HF hub, but taking a copy might be a good practice.
  • In your, define a model_fn that loads your model and processor, and returns both
    • If you chose to pack your model artifacts in the folder then the function will receive the local folder where they've been fetched already. If you chose to fetch them from HF Hub then you can ignore this parameter.
  • Define your input pre-processing, prediction, and output post-processing in input_fn, predict_fn, and output_fn
  • Add a requirements.txt in the same folder as your if you find you need to install specific versions of other libraries
  • From your Python notebook/environment, create your HuggingFaceModel and deploy() it as documented here, but specifying a serverless_inference_config to indicate you want a serverless endpoint.

You'll need to decide how you want to pass your image to the endpoint: Maybe it's an application/json request with a pointer to where the image has already been uploaded to Amazon S3, and your input_fn should retrieve the data from that location? Maybe you're passing raw image/jpeg or image/png bytes? I strongly recommend to avoid passing pixel array/tensor data via HTTP(S), because it's very inefficient compared to proper compressed image formats like JPEG - and you might hit the 5MB payload size limit sooner than you expect.

If you're testing your deployed endpoints from a Python notebook/environment using the SageMaker Python SDK, be aware that the serializers are what configure how the SDK passes your input data to the endpoint over HTTP(S). If you want to pass raw JPEG bytes then I believe something like serializer=DataSerializer(content_type="image/jpeg") should work... So long as you created your endpoint with an that knows how it should read image/jpeg requests.

answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions