- 最新
- 投票最多
- 评论最多
Have you read the following two blog posts?
- Maximize TensorFlow performance on Amazon SageMaker endpoints for real-time inference
- Reduce computer vision inference latency using gRPC with TensorFlow serving on Amazon SageMaker?
The first explores a few parameters that you can use to maximize the performance of a TensorFlow-based SageMaker real-time endpoint. These parameters are in essence overprovisioning serving processes and adjusting their parallel processing capabilities. As we saw in the tables, this overprovisioning and adjustment leads to better utilization of resources and higher throughput, sometimes an increase as much as 1,000%.
The second demonstrates how to reduce model serving latency for TensorFlow computer vision models on SageMaker via in-server gRPC communication, leading to gains of 75% in latency improvement in the example shown.
Thanks @MarkRoy for the help! I tried the suggestions but nothing worked and I am not able to get the inference in less than 2.6 sec. Any other suggestions? Thank you!
相关内容
- AWS 官方已更新 1 年前
- AWS 官方已更新 25 天前
- AWS 官方已更新 2 年前
- AWS 官方已更新 2 个月前
Probably over-simple but worth checking: What's your input request format? Are you sending in compressed images (e.g. 'image/jpeg') and extracting them in the endpoint? Or sending in raw pixel tensors (which could add significant communication overhead)