SageMaker Inference recommender - Model latency for streaming response

0

I have an inference endpoint that returns a HTTP streaming response and I would like to load test it.

Does ModelLatency in the recommender metrics refer to time to receive the first chunk, or time to receive all chunks?

c.f. https://docs.aws.amazon.com/sagemaker/latest/dg/inference-recommender-interpret-results.html

Gabriel
已提問 6 個月前檢視次數 54 次
沒有答案

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南