How to perform pb_utils.InferenceRequest between models in using Sagemaker Triton

1

I have a triton model-repository consisting of 5 what triton calls models - an ensemble called embedder that invokes a pipeline (tokenizer -> transformer -> pooler). This works fine in Sagemaker if I create the Sagemaker model with "Environment": {"SAGEMAKER_TRITON_DEFAULT_MODEL_NAME": "embedder"}

I then added a customer BLS (python) model called batcher - which simply splits the input request into batches, and calls embedder with each batch and then combines the results (later I want to invoke another model with the combined results - one that cannot be batched)

This works fine if I use the official triton container on my (external to AWS) local machine. But it won't work with Triton, specifically the line that calls one model from another inside Triton fails using the Sagemaker container but works with the NVidia container

            infer_request = pb_utils.InferenceRequest(
                model_name="embeddings",
                inputs=[batch],
                requested_output_names=["embedding"],
            )

           infer_response = infer_request.exec()

The error is

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{"error":"Failed to process the request(s) for model instance 'bls_sync_0', message: TritonModelException: Failed for execute the inference request. Model 'add_sub' is not ready.\n\nAt:\n  /opt/ml/model/bls_sync/1/model.py(112): execute\n"}"

This error is being raised on the last line of the previous code fragment, i.e. BLS scripting isn't working between models in the same repository. I've also reproduced this using the simple NVidia examples.

I also tried to configure the model with "Mode": "MultiModel" but the Sagemaker multi-model behaviour appears different to Tritons. With Triton I can invoke any of the 5 models by passing the model name (i.e. tokenizer or batcher). But Sagemaker requires that you pass a model archive (i.e. tar.gz) filename? If I try and invoke a sagemaker endpoint with TargetModel="batcher" I get Failed to download model models/my-model/my-model-name.tar.gzbatcher?

Can you make requests between models in AWS Triton (i.e. using pb_utils.InferenceRequest - i.e. when I copy the official example which works (https://github.com/triton-inference-server/python_backend/blob/main/examples/bls/sync_model.py) I get the exact same error message when the "bls_sync" model attempt to invoke the "add_sub" model within a sagemaker triton container.

Dave
asked 8 months ago408 views
3 Answers
1
Accepted Answer

Hi there,

Without knowing more details about your SageMaker Endpoint, it will be difficult to properly debug your issue. I would like to suggest that, if possible, you open a case with SageMaker Premium Support.

With that being said, I have done some testing with one of our SageMaker samples that uses Business Logic Scripting (BLS) with StableDiffusion on a Triton Inference container. The model.py script demonstrates how to use pb_utils.InferenceRequest which is very similar to the official example. I was able to deploy the endpoint and invoke the model successfully. The container used in testing was 785573368785.dkr.ecr.us-east-1.amazonaws.com/sagemaker-tritonserver:22.10-py3

There are a few bugs that I did encounter with the notebook itself such as the name of the main model under the /model_repository/pipeline directory sd_env.tar.gz causing an error when testing the container locally, changing the name to hf_env.tar.gz fixed this issue. Also please use the following lines of code when waiting for the endpoint to be InService:

resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
print("Status: " + status)

while status == "Creating":
    time.sleep(60)
    resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
    status = resp["EndpointStatus"]
    print("Status: " + status)

print("Arn: " + resp["EndpointArn"])
print("Status: " + status)

You may have issues running the !pip install -U sagemaker pywidgets numpy PIL if using a notebook instance, I only updated the SageMaker SDK and did not encounter any issues with the other libraries. Please note I used an ml.g5.4xlarge instance and the conda_python3 kernel during my testing.

AWS
SUPPORT ENGINEER
answered 8 months ago
0

Thanks @Thayin

I read through the example, and figured out what the key step is:

Enter image description here

I'd already suspected this from something else I read but could not work out how to rectify so thanks for the example! Basically, Triton loads models on demand. As far as I can tell If you use an ensemble, it pre-loads the steps, but if you invoke one model from another it's unaware of the dependency and doesn't (or at least may not) preload the target model.

So you need to explicitly preload the models, i..e

container = {
    "Image": mme_triton_image_uri,
    "ModelDataUrl": model_data_url,
    "Environment": {
        "SAGEMAKER_TRITON_DEFAULT_MODEL_NAME": "pipeline",
        "SAGEMAKER_TRITON_LOG_INFO": "false --load-model=text_encoder --load-model=vae",
    },
}

Thanks for you help!

Dave
answered 8 months ago
0

PS This seems to be a slight hack - you seem to be taking advantage of the fact that the setting that controls logging is appended to the container start command and adding additional commands on the end - this seems a little fragile and should perhaps be added explicitly as it's own Environment variable to ensure it's explicitly maintained?

Dave
answered 8 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions