- Newest
- Most votes
- Most comments
One key benefit of SageMaker Serverless over DIY in Lambda is that the model packaging & deployment experience should be pretty much the same regardless of whether you ultimately deploy to serverless, real-time, async, or batch inference... Subject to a few extra constraints for each deployment type: Notably, serverless does not currently support GPU acceleration so your inferences might run slow for deep learning models - and will time out if they take longer than 60sec!
...So if you come across any relevant tutorials for real-time endpoints or other deployment types on SageMaker for your particular models, swapping them to serverless may be just a minor change in the deploy(...)
call.
Without specifics on where you're struggling it's difficult to be exact, but my starting suggestion would be:
- Create a folder structure [as described here]https://huggingface.co/docs/sagemaker/inference#user-defined-code-and-modules) like
model/code/inference.py
- You don't need to include local copies of your source model & processor if you'd rather load them from HF hub, but taking a copy might be a good practice.
- In your
inference.py
, define amodel_fn
that loads your model and processor, and returns both- If you chose to pack your model artifacts in the folder then the function will receive the local folder where they've been fetched already. If you chose to fetch them from HF Hub then you can ignore this parameter.
- Define your input pre-processing, prediction, and output post-processing in
input_fn
,predict_fn
, andoutput_fn
- Add a requirements.txt in the same folder as your inference.py if you find you need to install specific versions of other libraries
- From your Python notebook/environment, create your
HuggingFaceModel
anddeploy()
it as documented here, but specifying a serverless_inference_config to indicate you want a serverless endpoint.
You'll need to decide how you want to pass your image to the endpoint: Maybe it's an application/json
request with a pointer to where the image has already been uploaded to Amazon S3, and your input_fn
should retrieve the data from that location? Maybe you're passing raw image/jpeg
or image/png
bytes? I strongly recommend to avoid passing pixel array/tensor data via HTTP(S), because it's very inefficient compared to proper compressed image formats like JPEG - and you might hit the 5MB payload size limit sooner than you expect.
If you're testing your deployed endpoints from a Python notebook/environment using the SageMaker Python SDK, be aware that the serializers are what configure how the SDK passes your input data to the endpoint over HTTP(S). If you want to pass raw JPEG bytes then I believe something like serializer=DataSerializer(content_type="image/jpeg") should work... So long as you created your endpoint with an inference.py/input_fn
that knows how it should read image/jpeg
requests.
Relevant content
- asked 10 months ago
- asked 3 months ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated a year ago
Hello Messi, could you add some more explanation why this is being challenging, e.g. error message you are getting, or specific gap with API specification?