2 Answers
- Newest
- Most votes
- Most comments
1
Hi,
This post says that it was not supported as of May-June: https://discuss.huggingface.co/t/streaming-output-text-when-deploying-on-sagemaker/39611/5
Best,
Didier
0
Wow, good timing to ask...
Yes, SageMaker does now support response streaming as of just last Friday. You can find the:
- AWS Blog post with example walkthroughs (code here on GitHub)
- What's new post including links to the relevant API docs and developer guide pages
As mentioned in the blog, you'll probably want to use the AWS Large Model Inference (LMI) or Hugging Face Text Generation Interface (TGI) containers to take advantage of this feature... The 'vanilla' framework containers for e.g. PyTorch and Hugging Face (particularly when used in script mode with an inference.py
, custom input_fn
, predict_fn
, etc) use some synchronous patterns that I'm not sure translate nicely to streaming contexts so far.
Relevant content
- asked 5 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
Yup - my understanding is this is still correct today EDIT: But not a week later!