I have an inference endpoint that returns a HTTP streaming response and I would like to load test it.
Does ModelLatency in the recommender metrics refer to time to receive the first chunk, or time to receive all chunks?
ModelLatency
c.f. https://docs.aws.amazon.com/sagemaker/latest/dg/inference-recommender-interpret-results.html
The following links may help you understand ModelLatency in more detail. https://aws.amazon.com/blogs/machine-learning/best-practices-for-load-testing-amazon-sagemaker-real-time-inference-endpoints/ and https://repost.aws/knowledge-center/sagemaker-endpoint-latency particularly note how ModelLatency and OverheadLatency are defined.
Accesso non effettuato. Accedi per postare una risposta.
Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.
The following links may help you understand ModelLatency in more detail. https://aws.amazon.com/blogs/machine-learning/best-practices-for-load-testing-amazon-sagemaker-real-time-inference-endpoints/ and https://repost.aws/knowledge-center/sagemaker-endpoint-latency particularly note how ModelLatency and OverheadLatency are defined.