how to invoke a multi model endpoint in triton server?

0

set up - testing a multi model endpoint in a nvidia triton server. I created the multi model endpoints based on the aws docs here (https://docs.aws.amazon.com/sagemaker/latest/dg/create-multi-model-endpoint.html#create-multi-model-endpoint-sdk-gpu) . the model i'm using is a ditilbert model (https://huggingface.co/docs/transformers/model_doc/distilbert). also based on the docs here, https://aws.amazon.com/blogs/machine-learning/host-ml-models-on-amazon-sagemaker-using-triton-python-backend/, i created a model.py file for the python backend (sample below). I also have config.pbtxt file . based on the docs here - https://raw.githubusercontent.com/triton-inference-server/python_backend/main/examples/add_sub/model.py , we need to implement execute method and execute method takes a list of pb_utils.InferenceRequest argument. before this , in a single model implementation , i invoke it by passing json data as payload. so i understand i have to convert my text to pb_utils.InferenceRequest type , as the python backend for triton server only accepts pb_utils.InferenceRequest type. I couldn't find any examples of this. what are the steps i need, to migrate to a multi model endpoint in a nvidia triton container. I need to convert text to token via tokenizer before sending to the python backend?

config.pbtxt

name: “some_model_config”
backend: “python”
max_batch_size: ??
input: [
    {
         name: “INPUT0”
         data_type: TYPE_STRING
         dims: [ -1 ]
    },
]
output [
      {
           name: “output”
           data_type: TYPE_STRING
           dims: [ -1 ]

invoke a multi model endpoint

payload = "some text here"
response = runtime_sm_client.invoke_endpoint(
EndpointName=endpoint_name,
ContentType="application/json",
Body=json.dumps(payload),
)
import triton_python_backend_utils as pb_utils
class TritonPythonModel:
    def initialize(self, args):
       ....

    def execute(self, requests):
       `execute` must be implemented in every Python model. `execute`
        function receives a list of pb_utils.InferenceRequest as the only
        argument. This function is called when an inference is requested
        for this model.

        Parameters   
        requests : list     A list of pb_utils.InferenceRequest

        Returns :  list of pb_utils.InferenceResponse. The length of this list must be the same as `requests`
asked a year ago137 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions