SageMaker Inference recommender - Model latency for streaming response

0

I have an inference endpoint that returns a HTTP streaming response and I would like to load test it.

Does ModelLatency in the recommender metrics refer to time to receive the first chunk, or time to receive all chunks?

c.f. https://docs.aws.amazon.com/sagemaker/latest/dg/inference-recommender-interpret-results.html

Gabriel
asked 5 months ago50 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions