Sagemaker Can do this? multimodel-endpoint + async inferce

0

Following this document, I try to async inference with Multi Model Endpoint.

I try to set **kwargs from Model().deploy() to MultiDataModel().deploy(**kwargs) . From that document, MultiDataModel(**kwargs) pass to Model(**kwargs). I assume MultiDataModel().deploy(**kwargs) pass to Model().deploy(**kwargs).

So, I try to like that.

from sagemaker.async_inference.async_inference_config import AsyncInferenceConfig

async_config = AsyncInferenceConfig(
    output_path=f"s3://{bucket}/{prefix}/output",
    max_concurrent_invocations_per_instance=4
)

mme = MultiDataModel(
    name='MultiModel',
    model_data_prefix=multi_model_s3uri,
    model=model,  # passing our model
    sagemaker_session=sess
)

predictor = mme.deploy(
    initial_instance_count=1,
    instance_type="ml."+ instance_type,
    async_inference_config  = async_config
)

But, I realize that <class Predictor != class AsyncPredictor>. I want to get AsyncPredictor(not Predictor) from MultiModelEndpoint.deploy().

1 回答
0

Hi,

Unfortunately, the multi model is not supported to run an async inferences yet, but alternatively you could host several models within a single container and run them using BYOS strategy.

Here is an example to utilize async inference for multiple models packaged in a single tar.

  1. Package multiple models in a single tar.gz file:
with tarfile.open("multi-model.tar.gz", "w:gz") as mm:
    mm.add("./multi-model/xgboost-model1")
    mm.add("./multi-model/xgboost-model2")
  1. Upload the model to S3 bucket:
sm_session = sagemaker.session.Session()
sm_client = boto_session.client("sagemaker")
sm_runtime = boto_session.client("sagemaker-runtime")

 def upload_model(input_location):
    prefix = "async-multi-model-example/models
    return sm_session.upload_data(
        input_location, 
        bucket=sm_session.default_bucket(),
        key_prefix="async-multi-model-example")

model_url = upload_model("multi-model.tar.gz")
  1. Create a sageMaker model from pre-trained model tar.gz:
from sagemaker import image_uris
container = image_uris.retrieve(region=region, framework="xgboost", version="1.2-1")

create_model_response = sm_client.create_model(
    ModelName="async-multi-model",
    ExecutionRoleArn=sm_role,
    PrimaryContainer={
        "Image": container,
        "ModelDataUrl": model_url,
    },
)
  1. Create an endpoint with config:
create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName="async-multi-model-config",
    ProductionVariants=[
        {
            "VariantName": "variant1",
            "ModelName": model_name,
            "InstanceType": "ml.m5.xlarge",
            "InitialInstanceCount": 1,
        }
    ],
    AsyncInferenceConfig={
        "OutputConfig": {
            "S3OutputPath": f"s3://{s3_bucket}/{bucket_prefix}/output",
        },
        "ClientConfig": {"MaxConcurrentInvocationsPerInstance": 4},
    },
)
  1. Create the endpoint
create_endpoint_response = sm_client.create_endpoint(
    EndpointName="async-multi-model-endpoint", EndpointConfigName="async-multi-model-config"
)

  1. Invoke the endpoint async:
# Specify the location of the input. Here, a single SVM sample
input_location = "s3://{s3_bucket}/test_point_0.libsvm"

endpoint_name="async-multi-model-endpoint"
response = sm_runtime.invoke_endpoint_async(
                            EndpointName=endpoint_name, 
                            InputLocation=input_location)

This is a customized alternative solution since the feature is not enabled yet, you may going to a bit off the beaten path with that.

Hope it helps

AWS
Jady
已回答 2 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则