Host a fine-tuned BERT Multilingual model on SageMaker with Serverless inference
Hi All,
Good day!!
Key point to note here is, we have pre-processing script for the text document, deserialize which is required for prediction then we have post-processing script for generating NER (entitites).
I went through SageMaker material and decided to try following options.
Option 1: Bring our own model, write a inference script and deploy it on SM real-time endpoint using Pytorch container. I went through Suman video (https://www.youtube.com/watch?v=D9Qo5OpG4p8) which is really good, need to try with our pre-processing and post-processing scripts then see if it works fine or not.
Option 2: Bring our own model, write a inference script and deploy it on SM real-time endpoint using Huggingface container. I went through Huggingface docs (https://huggingface.co/docs/sagemaker/inference#deploy-a-%F0%9F%A4%97-transformers-model-trained-in-sagemaker) but there is no reference for how to use own pre and post-processing scripts to setup inference pipeline.
If you know any examples on using our own pre and post-processing scripts using Huggingface container then please share it.
- Option 3: Bring our own model, write a inference script and deploy it on SM Serverless inference/endpoint using Huggingface container. I went through Julien video (https://www.youtube.com/watch?v=cUhDLoBH80o&list=PLJgojBtbsuc0E1JcQheqgHUUThahGXLJT&index=35) which is excellent but he has not shown how to use our own pre and post-processing scripts using Huggingface container.
Please share if you know any examples.
Could you please help?
Thanks, Vinayak
Hi Vinayak, thanks for opening this thread.
This is the official documentation and describes how the inference script for pre and postprocessing should be structured: https://huggingface.co/docs/sagemaker/inference#user-defined-code-and-modules.
And here a simple example of an inference script with pre and postprocessing: https://github.com/marshmellow77/text-summarisation-project/blob/main/inference_code/inference.py
Hope that helps, please reach out in case of questions.
Thanks Heiko
Relevant questions
How to create a serverless endpoint configuration?
Accepted Answerasked 3 months agoUsing Hyperparameter Tuning Jobs over Training and Preprocessing
Accepted Answerasked a year agoHost a fine-tuned BERT Multilingual model on SageMaker with Serverless inference
asked 3 months agoHow to define concurrency in SageMaker real-time inference
asked 3 months agoSageMaker with multiple models
Accepted Answerasked 2 years ago[Feature Request] Serverless Inference with VPC Config
asked 4 months agoSM Elastic Inference Accelerators are not available during inference
asked 3 months agoDid the SageMaker PyTorch deployment process change?
Accepted Answer[Help/ideas wanted] Serverless Inference: Optimize cold start time
asked 4 months agoDoes the pre and post-processing need to be incorporate in SageMaker?
asked 3 months ago