How to perform pb_utils.InferenceRequest between models in using Sagemaker Triton

1

I have a triton model-repository consisting of 5 what triton calls models - an ensemble called embedder that invokes a pipeline (tokenizer -> transformer -> pooler). This works fine in Sagemaker if I create the Sagemaker model with "Environment": {"SAGEMAKER_TRITON_DEFAULT_MODEL_NAME": "embedder"}

I then added a customer BLS (python) model called batcher - which simply splits the input request into batches, and calls embedder with each batch and then combines the results (later I want to invoke another model with the combined results - one that cannot be batched)

This works fine if I use the official triton container on my (external to AWS) local machine. But it won't work with Triton, specifically the line that calls one model from another inside Triton fails using the Sagemaker container but works with the NVidia container

            infer_request = pb_utils.InferenceRequest(
                model_name="embeddings",
                inputs=[batch],
                requested_output_names=["embedding"],
            )

           infer_response = infer_request.exec()

The error is

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{"error":"Failed to process the request(s) for model instance 'bls_sync_0', message: TritonModelException: Failed for execute the inference request. Model 'add_sub' is not ready.\n\nAt:\n  /opt/ml/model/bls_sync/1/model.py(112): execute\n"}"

This error is being raised on the last line of the previous code fragment, i.e. BLS scripting isn't working between models in the same repository. I've also reproduced this using the simple NVidia examples.

I also tried to configure the model with "Mode": "MultiModel" but the Sagemaker multi-model behaviour appears different to Tritons. With Triton I can invoke any of the 5 models by passing the model name (i.e. tokenizer or batcher). But Sagemaker requires that you pass a model archive (i.e. tar.gz) filename? If I try and invoke a sagemaker endpoint with TargetModel="batcher" I get Failed to download model models/my-model/my-model-name.tar.gzbatcher?

Can you make requests between models in AWS Triton (i.e. using pb_utils.InferenceRequest - i.e. when I copy the official example which works (https://github.com/triton-inference-server/python_backend/blob/main/examples/bls/sync_model.py) I get the exact same error message when the "bls_sync" model attempt to invoke the "add_sub" model within a sagemaker triton container.

Dave
質問済み 9ヶ月前436ビュー
3回答
1
承認された回答

Hi there,

Without knowing more details about your SageMaker Endpoint, it will be difficult to properly debug your issue. I would like to suggest that, if possible, you open a case with SageMaker Premium Support.

With that being said, I have done some testing with one of our SageMaker samples that uses Business Logic Scripting (BLS) with StableDiffusion on a Triton Inference container. The model.py script demonstrates how to use pb_utils.InferenceRequest which is very similar to the official example. I was able to deploy the endpoint and invoke the model successfully. The container used in testing was 785573368785.dkr.ecr.us-east-1.amazonaws.com/sagemaker-tritonserver:22.10-py3

There are a few bugs that I did encounter with the notebook itself such as the name of the main model under the /model_repository/pipeline directory sd_env.tar.gz causing an error when testing the container locally, changing the name to hf_env.tar.gz fixed this issue. Also please use the following lines of code when waiting for the endpoint to be InService:

resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
print("Status: " + status)

while status == "Creating":
    time.sleep(60)
    resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
    status = resp["EndpointStatus"]
    print("Status: " + status)

print("Arn: " + resp["EndpointArn"])
print("Status: " + status)

You may have issues running the !pip install -U sagemaker pywidgets numpy PIL if using a notebook instance, I only updated the SageMaker SDK and did not encounter any issues with the other libraries. Please note I used an ml.g5.4xlarge instance and the conda_python3 kernel during my testing.

AWS
サポートエンジニア
回答済み 8ヶ月前
0

Thanks @Thayin

I read through the example, and figured out what the key step is:

Enter image description here

I'd already suspected this from something else I read but could not work out how to rectify so thanks for the example! Basically, Triton loads models on demand. As far as I can tell If you use an ensemble, it pre-loads the steps, but if you invoke one model from another it's unaware of the dependency and doesn't (or at least may not) preload the target model.

So you need to explicitly preload the models, i..e

container = {
    "Image": mme_triton_image_uri,
    "ModelDataUrl": model_data_url,
    "Environment": {
        "SAGEMAKER_TRITON_DEFAULT_MODEL_NAME": "pipeline",
        "SAGEMAKER_TRITON_LOG_INFO": "false --load-model=text_encoder --load-model=vae",
    },
}

Thanks for you help!

Dave
回答済み 8ヶ月前
0

PS This seems to be a slight hack - you seem to be taking advantage of the fact that the setting that controls logging is appended to the container start command and adding additional commands on the end - this seems a little fragile and should perhaps be added explicitly as it's own Environment variable to ensure it's explicitly maintained?

Dave
回答済み 8ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ