Sagemaker Can do this? multimodel-endpoint + async inferce

Question

Following this [document](https://sagemaker.readthedocs.io/en/stable/api/inference/multi_data_model.html), I try to async inference with Multi Model Endpoint.

I try to set **kwargs  from Model().deploy() to  MultiDataModel().deploy(**kwargs) .
From that document, MultiDataModel(**kwargs) pass to Model(**kwargs).
I assume MultiDataModel().deploy(**kwargs) pass to Model().deploy(**kwargs).

So, I try to like that.

```python
from sagemaker.async_inference.async_inference_config import AsyncInferenceConfig

async_config = AsyncInferenceConfig(
    output_path=f"s3://{bucket}/{prefix}/output",
    max_concurrent_invocations_per_instance=4
)

mme = MultiDataModel(
    name='MultiModel',
    model_data_prefix=multi_model_s3uri,
    model=model,  # passing our model
    sagemaker_session=sess
)

predictor = mme.deploy(
    initial_instance_count=1,
    instance_type="ml."+ instance_type,
    async_inference_config  = async_config
)

```

But, I realize that .
I want to get AsyncPredictor(not Predictor) from MultiModelEndpoint.deploy().

Answer

Hi,

Unfortunately, the multi model is not supported to run an async inferences yet, but alternatively you could host several models within a single container and run them using BYOS strategy.

Here is an example to utilize async inference for multiple models packaged in a single tar.

1. Package multiple models in a single tar.gz file:
```
with tarfile.open("multi-model.tar.gz", "w:gz") as mm:
    mm.add("./multi-model/xgboost-model1")
    mm.add("./multi-model/xgboost-model2")
```

2. Upload the model to S3 bucket:
```
sm_session = sagemaker.session.Session()
sm_client = boto_session.client("sagemaker")
sm_runtime = boto_session.client("sagemaker-runtime")

def upload_model(input_location):
    prefix = "async-multi-model-example/models
    return sm_session.upload_data(
        input_location, 
        bucket=sm_session.default_bucket(),
        key_prefix="async-multi-model-example")

model_url = upload_model("multi-model.tar.gz")
```

2. Create a sageMaker model from pre-trained model tar.gz:
```
from sagemaker import image_uris
container = image_uris.retrieve(region=region, framework="xgboost", version="1.2-1")

create_model_response = sm_client.create_model(
    ModelName="async-multi-model",
    ExecutionRoleArn=sm_role,
    PrimaryContainer={
        "Image": container,
        "ModelDataUrl": model_url,
    },
)
```

3.  Create an endpoint with config:
```
create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName="async-multi-model-config",
    ProductionVariants=[
        {
            "VariantName": "variant1",
            "ModelName": model_name,
            "InstanceType": "ml.m5.xlarge",
            "InitialInstanceCount": 1,
        }
    ],
    AsyncInferenceConfig={
        "OutputConfig": {
            "S3OutputPath": f"s3://{s3_bucket}/{bucket_prefix}/output",
        },
        "ClientConfig": {"MaxConcurrentInvocationsPerInstance": 4},
    },
)
```
4. Create the endpoint
```
create_endpoint_response = sm_client.create_endpoint(
    EndpointName="async-multi-model-endpoint", EndpointConfigName="async-multi-model-config"
)

```
5. Invoke the endpoint async:
```
# Specify the location of the input. Here, a single SVM sample
input_location = "s3://{s3_bucket}/test_point_0.libsvm"

endpoint_name="async-multi-model-endpoint"
response = sm_runtime.invoke_endpoint_async(
                            EndpointName=endpoint_name, 
                            InputLocation=input_location)
```

This is a customized alternative solution since the feature is not enabled yet, you may going to a bit off the beaten path with that.

Hope it helps

Sagemaker Can do this? multimodel-endpoint + async inferce

Relevanter Inhalt