1 Answer
- Newest
- Most votes
- Most comments
0
Customer does not want to spin-up different containers for each model due to network adding additional latency.
I am assuming this is a pipeline scenario where different models need to be chained. If so, it's important to keep in mind that all containers in pipeline run on the same EC2 instance so that "inferences run with low latency because the containers are co-located on the same EC2 instances."[1]
Hope this is useful.
[1] https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipelines.html
Relevant content
- asked 3 years ago
- AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated 8 months ago
