This question is mostly for educational purposes, but the current SageMaker documentation does not describe whether these things are allowed or not.
Lets suppose I have:
- a
XGBoost_model_1 (that needs a XGBoost container)
- a
KMeans_model_1 and a KMeans_model_2 (both require a KMeans container)
1. Here's the first question - can I do the following:
- create a
Model with InferenceExecutionConfig.Mode=Direct and specify two cointainers (XGBoost and KMeans with Mode: MultiModel)
That would enable the client:
- to call
invoke_endpoint(TargetContainer="XGBoost") to access the XGBoost_model_1
- to call
invoke_endpoint(TargetContainer="KMeans", TargetModel="KMeans_model_1") to access the KMeans_model_1
- to call
invoke_endpoint(TargetContainer="KMeans", TargetModel="KMeans_model_2") to access the KMeans_model_2
I don't see a straight answer in the documentation whether combining Multi-Model containers with Multi-container endpoint is possible.
2. The second question - how does the above idea work with ProductionVariants. Can I create something like this:
Variant1 with XGBoost serving XGBoost_model_1 having a weight of 0.5
Variant2 with a Multi-container having both XGBoost and KMeans (with a MultiModel setup) having a weight of 0.5
So that the client could:
- call
invoke_endpoint(TargetVariant="Variant2", TargetContainer="KMeans", TargetModel="KMeans_model_1") to access the KMeans_model_1
- call
invoke_endpoint(TargetVariant="Variant2", TargetContainer="KMeans", TargetModel="KMeans_model_2") to access the KMeans_model_2
- call
invoke_endpoint(TargetVariant="Variant1") to access the XGBoost_model_1
- call
invoke_endpoint(TargetVariant="Variant2", TargetContainer="XGBoost") to access the XGBoost_model_1
Is that combination even possible?
If so, what happens when the client calls the invoke_endpoint without specifying the variant? For example:
- would
invoke_endpoint(TargetContainer="KMeans", TargetModel="KMeans_model_2") fail 50% of the time (if it hits the right variant then it works just fine, if it hits the wrong one it would most likely result with a 400/500 error ("incorrect payload")?