how to invoke a multi model endpoint in triton server?

0

set up - testing a multi model endpoint in a nvidia triton server. I created the multi model endpoints based on the aws docs here (https://docs.aws.amazon.com/sagemaker/latest/dg/create-multi-model-endpoint.html#create-multi-model-endpoint-sdk-gpu) . the model i'm using is a ditilbert model (https://huggingface.co/docs/transformers/model_doc/distilbert). also based on the docs here, https://aws.amazon.com/blogs/machine-learning/host-ml-models-on-amazon-sagemaker-using-triton-python-backend/, i created a model.py file for the python backend (sample below). I also have config.pbtxt file . based on the docs here - https://raw.githubusercontent.com/triton-inference-server/python_backend/main/examples/add_sub/model.py , we need to implement execute method and execute method takes a list of pb_utils.InferenceRequest argument. before this , in a single model implementation , i invoke it by passing json data as payload. so i understand i have to convert my text to pb_utils.InferenceRequest type , as the python backend for triton server only accepts pb_utils.InferenceRequest type. I couldn't find any examples of this. what are the steps i need, to migrate to a multi model endpoint in a nvidia triton container. I need to convert text to token via tokenizer before sending to the python backend?

config.pbtxt

name: “some_model_config”
backend: “python”
max_batch_size: ??
input: [
    {
         name: “INPUT0”
         data_type: TYPE_STRING
         dims: [ -1 ]
    },
]
output [
      {
           name: “output”
           data_type: TYPE_STRING
           dims: [ -1 ]

invoke a multi model endpoint

payload = "some text here"
response = runtime_sm_client.invoke_endpoint(
EndpointName=endpoint_name,
ContentType="application/json",
Body=json.dumps(payload),
)
import triton_python_backend_utils as pb_utils
class TritonPythonModel:
    def initialize(self, args):
       ....

    def execute(self, requests):
       `execute` must be implemented in every Python model. `execute`
        function receives a list of pb_utils.InferenceRequest as the only
        argument. This function is called when an inference is requested
        for this model.

        Parameters   
        requests : list     A list of pb_utils.InferenceRequest

        Returns :  list of pb_utils.InferenceResponse. The length of this list must be the same as `requests`
gefragt vor einem Jahr141 Aufrufe
Keine Antworten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen