How to implement a sagemaker ensemble model outside AWS


I would like to know if it is possible to implement an ensemble model trained in Sagemaker for tabular data that targets a binary classification column.

I am very struck by the fact that I have not been able to find examples either on GitHub or in the internal AWS documents, which makes me think that it is not possible to implement these models outside of AWS.

Along with the above, I was able to download the file from the top performance. I was able to implement the 3 models that make up the assembly individually but not the model that groups and implements the predictions called predict.pkl as you can see on the next image.

There are no available versions from auto_gloum different than 0.8.1 and there are no code examples on how to “fit” the other models reason why I keep thinking that AWS although it says that models are allow to be used outside AWS put this issues so people keep getting errors and force to use aws endpoints

Errors during Implementation

Finally was able to take a look at this github repo but theres any info regarding this case:

I sincerely appreciate any kind of help, link, blog or steps to follow.

Greetings Alejandro Holguin M

asked 10 months ago312 views
1 Answer


I understand that you are concerned about implementing a Sagemaker ensemble model outside of AWS and would like to gather more information on the same.

Firstly, model deployment in machine learning (ML) is becoming increasingly complex. You want to deploy not just one ML model but large groups of ML models represented as ensemble workflows. These workflows are comprised of multiple ML models. Productionizing these ML models is challenging because you need to adhere to various performance and latency requirements.

Amazon SageMaker supports single-instance ensembles with the Triton Inference Server. This capability allows you to run model ensembles that fit on a single instance. Behind the scenes, SageMaker leverages the Triton Inference Server to manage the ensemble on every instance behind the endpoint to maximize throughput and hardware utilization with ultra-low (single-digit milliseconds) inference latency. With Triton, you can also choose from a wide range of supported ML frameworks (including TensorFlow, PyTorch, ONNX, XGBoost, and NVIDIA TensorRT) and infrastructure backends, including GPUs, CPUs, and AWS Inferentia.

Furthermore, I would like to mention that you can use Triton to implement ensembles, as it is not specific to AWS or Sagemaker. [1]

Triton Inference Server is designed to enable teams to deploy, run, and scale trained AI models from any framework on any GPU- or CPU-based infrastructure. In addition, it has been optimized to offer high-performance inference at scale with features like dynamic batching, concurrent runs, optimal model configuration, model ensemble capabilities, and support for streaming inputs.

I would request that you please refer to the aforementioned documentation once, and please reach out to AWS [4] with the detailed use case so that we can assist you better.

If you have any difficulty verifying any of the above-mentioned points or if you still run into issues, please reach out to AWS Support [4] (Sagemaker) along with your issue or use case in detail, and we would be happy to assist you further.





[4] Creating support cases and case management -

answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions