Does the SageMaker Inference Endpoint support SSE (Streaming Response, text/event-stream)?

0

For Now, I use SageMaker inference endpoint to deploy LLM and use the TGI container to provide services, now I want to use the TGI stream function, but I didn't find anything about mentioning this from AWS documentation, I want to confirm whether it is supported, if so, How should it be used with SDK, if not supported, is there any workaround?

Thanks

Eastwoo
asked 8 months ago443 views
2 Answers
1

Hi,

This post says that it was not supported as of May-June: https://discuss.huggingface.co/t/streaming-output-text-when-deploying-on-sagemaker/39611/5

Best,

Didier

profile pictureAWS
EXPERT
answered 8 months ago
AWS
EXPERT
Alex_T
reviewed 8 months ago
  • Yup - my understanding is this is still correct today EDIT: But not a week later!

0

Wow, good timing to ask...

Yes, SageMaker does now support response streaming as of just last Friday. You can find the:

As mentioned in the blog, you'll probably want to use the AWS Large Model Inference (LMI) or Hugging Face Text Generation Interface (TGI) containers to take advantage of this feature... The 'vanilla' framework containers for e.g. PyTorch and Hugging Face (particularly when used in script mode with an inference.py, custom input_fn, predict_fn, etc) use some synchronous patterns that I'm not sure translate nicely to streaming contexts so far.

AWS
EXPERT
Alex_T
answered 8 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions