- 新しい順
- 投票が多い順
- コメントが多い順
Hello,
I understand that you are concerned about implementing a Sagemaker ensemble model outside of AWS and would like to gather more information on the same.
Firstly, model deployment in machine learning (ML) is becoming increasingly complex. You want to deploy not just one ML model but large groups of ML models represented as ensemble workflows. These workflows are comprised of multiple ML models. Productionizing these ML models is challenging because you need to adhere to various performance and latency requirements.
Amazon SageMaker supports single-instance ensembles with the Triton Inference Server. This capability allows you to run model ensembles that fit on a single instance. Behind the scenes, SageMaker leverages the Triton Inference Server to manage the ensemble on every instance behind the endpoint to maximize throughput and hardware utilization with ultra-low (single-digit milliseconds) inference latency. With Triton, you can also choose from a wide range of supported ML frameworks (including TensorFlow, PyTorch, ONNX, XGBoost, and NVIDIA TensorRT) and infrastructure backends, including GPUs, CPUs, and AWS Inferentia.
Furthermore, I would like to mention that you can use Triton to implement ensembles, as it is not specific to AWS or Sagemaker. [1]
Triton Inference Server is designed to enable teams to deploy, run, and scale trained AI models from any framework on any GPU- or CPU-based infrastructure. In addition, it has been optimized to offer high-performance inference at scale with features like dynamic batching, concurrent runs, optimal model configuration, model ensemble capabilities, and support for streaming inputs.
I would request that you please refer to the aforementioned documentation once, and please reach out to AWS [4] with the detailed use case so that we can assist you better.
If you have any difficulty verifying any of the above-mentioned points or if you still run into issues, please reach out to AWS Support [4] (Sagemaker) along with your issue or use case in detail, and we would be happy to assist you further.
References:
[1] https://developer.nvidia.com/triton-inference-server
[4] Creating support cases and case management - https://docs.aws.amazon.com/awssupport/latest/user/case-management.html#creating-a-support-casehttps://docs.aws.amazon.com/awssupport/latest/user/case-management.html#creating-a-support-case
関連するコンテンツ
- AWS公式更新しました 1年前