SageMaker Inference recommender - Model latency for streaming response

0

I have an inference endpoint that returns a HTTP streaming response and I would like to load test it.

Does ModelLatency in the recommender metrics refer to time to receive the first chunk, or time to receive all chunks?

c.f. https://docs.aws.amazon.com/sagemaker/latest/dg/inference-recommender-interpret-results.html

Francis Flannery 专家
6 个月前
The following links may help you understand ModelLatency in more detail. https://aws.amazon.com/blogs/machine-learning/best-practices-for-load-testing-amazon-sagemaker-real-time-inference-endpoints/ and https://repost.aws/knowledge-center/sagemaker-endpoint-latency particularly note how ModelLatency and OverheadLatency are defined.

主题

机器学习和人工智能

标签

Amazon SageMaker

语言

English

已提问 6 个月前54 查看次数

没有答案

最新
投票最多
评论最多

相关内容

关于Amazon Sagemaker Neo编译模型中推理脚本的帮助
专家
rePost Polyglot
已提问 8 个月前
Vitis示例rtl_streaming_free_running需要U200平台。
专家
rePost Polyglot
已提问 6 个月前
使用Sagemaker创建Huggingface模型错误，Sagemaker endpoint 一直处于“创建”状态。
专家
rePost Polyglot
已提问 8 个月前
Amazon SageMaker端点是否可以配备多个Amazon Elastic Inference Accelerator?
专家
rePost Polyglot
已提问 8 个月前
如何确保我的客户端 I/O 不会因为安全补丁而中断？
AWS 官方已更新 1 年前
如何解决 Amazon SageMaker 推理错误“从上游读取响应标头时出现上游超时（110：连接超时）”？
AWS 官方已更新 2 年前
如何为在 Python 2.7/3.6/3.7 上运行的 AWS Lambda 函数更新 AWS CloudFormation cfn-response 模块？
AWS 官方已更新 3 年前
如何排查 Application Load Balancer HTTP 403 禁止错误？
AWS 官方已更新 2 年前
保留不健康的 Auto Scaling 实例 -- 分离不健康的 ASG 实例而不是终止
支持工程师
Tim
已发布 12 天前
通过自定义终止策略，让AWS自动伸缩组（ASG）始终终止最旧的实例
支持工程师
Tim
已发布 12 天前