Sagemaker Can do this? multimodel-endpoint + async inferce

0

Following this document, I try to async inference with Multi Model Endpoint.

I try to set **kwargs from Model().deploy() to MultiDataModel().deploy(**kwargs) . From that document, MultiDataModel(**kwargs) pass to Model(**kwargs). I assume MultiDataModel().deploy(**kwargs) pass to Model().deploy(**kwargs).

So, I try to like that.

from sagemaker.async_inference.async_inference_config import AsyncInferenceConfig

async_config = AsyncInferenceConfig(
    output_path=f"s3://{bucket}/{prefix}/output",
    max_concurrent_invocations_per_instance=4
)

mme = MultiDataModel(
    name='MultiModel',
    model_data_prefix=multi_model_s3uri,
    model=model,  # passing our model
    sagemaker_session=sess
)

predictor = mme.deploy(
    initial_instance_count=1,
    instance_type="ml."+ instance_type,
    async_inference_config  = async_config
)

But, I realize that <class Predictor != class AsyncPredictor>. I want to get AsyncPredictor(not Predictor) from MultiModelEndpoint.deploy().

1 Antwort
0

Hi,

Unfortunately, the multi model is not supported to run an async inferences yet, but alternatively you could host several models within a single container and run them using BYOS strategy.

Here is an example to utilize async inference for multiple models packaged in a single tar.

  1. Package multiple models in a single tar.gz file:
with tarfile.open("multi-model.tar.gz", "w:gz") as mm:
    mm.add("./multi-model/xgboost-model1")
    mm.add("./multi-model/xgboost-model2")
  1. Upload the model to S3 bucket:
sm_session = sagemaker.session.Session()
sm_client = boto_session.client("sagemaker")
sm_runtime = boto_session.client("sagemaker-runtime")

 def upload_model(input_location):
    prefix = "async-multi-model-example/models
    return sm_session.upload_data(
        input_location, 
        bucket=sm_session.default_bucket(),
        key_prefix="async-multi-model-example")

model_url = upload_model("multi-model.tar.gz")
  1. Create a sageMaker model from pre-trained model tar.gz:
from sagemaker import image_uris
container = image_uris.retrieve(region=region, framework="xgboost", version="1.2-1")

create_model_response = sm_client.create_model(
    ModelName="async-multi-model",
    ExecutionRoleArn=sm_role,
    PrimaryContainer={
        "Image": container,
        "ModelDataUrl": model_url,
    },
)
  1. Create an endpoint with config:
create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName="async-multi-model-config",
    ProductionVariants=[
        {
            "VariantName": "variant1",
            "ModelName": model_name,
            "InstanceType": "ml.m5.xlarge",
            "InitialInstanceCount": 1,
        }
    ],
    AsyncInferenceConfig={
        "OutputConfig": {
            "S3OutputPath": f"s3://{s3_bucket}/{bucket_prefix}/output",
        },
        "ClientConfig": {"MaxConcurrentInvocationsPerInstance": 4},
    },
)
  1. Create the endpoint
create_endpoint_response = sm_client.create_endpoint(
    EndpointName="async-multi-model-endpoint", EndpointConfigName="async-multi-model-config"
)

  1. Invoke the endpoint async:
# Specify the location of the input. Here, a single SVM sample
input_location = "s3://{s3_bucket}/test_point_0.libsvm"

endpoint_name="async-multi-model-endpoint"
response = sm_runtime.invoke_endpoint_async(
                            EndpointName=endpoint_name, 
                            InputLocation=input_location)

This is a customized alternative solution since the feature is not enabled yet, you may going to a bit off the beaten path with that.

Hope it helps

AWS
Jady
beantwortet vor 2 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen