set up - testing a multi model endpoint in a nvidia triton server. I created the multi model endpoints based on the aws docs here (https://docs.aws.amazon.com/sagemaker/latest/dg/create-multi-model-endpoint.html#create-multi-model-endpoint-sdk-gpu) . the model i'm using is a ditilbert model (https://huggingface.co/docs/transformers/model_doc/distilbert). also based on the docs here, https://aws.amazon.com/blogs/machine-learning/host-ml-models-on-amazon-sagemaker-using-triton-python-backend/, i created a model.py file for the python backend (sample below). I also have config.pbtxt file . based on the docs here - https://raw.githubusercontent.com/triton-inference-server/python_backend/main/examples/add_sub/model.py , we need to implement execute method and execute method takes a list of pb_utils.InferenceRequest argument. before this , in a single model implementation , i invoke it by passing json data as payload. so i understand i have to convert my text to pb_utils.InferenceRequest type , as the python backend for triton server only accepts pb_utils.InferenceRequest type. I couldn't find any examples of this. what are the steps i need, to migrate to a multi model endpoint in a nvidia triton container. I need to convert text to token via tokenizer before sending to the python backend?
config.pbtxt
name: “some_model_config”
backend: “python”
max_batch_size: ??
input: [
{
name: “INPUT0”
data_type: TYPE_STRING
dims: [ -1 ]
},
]
output [
{
name: “output”
data_type: TYPE_STRING
dims: [ -1 ]
invoke a multi model endpoint
payload = "some text here"
response = runtime_sm_client.invoke_endpoint(
EndpointName=endpoint_name,
ContentType="application/json",
Body=json.dumps(payload),
)
import triton_python_backend_utils as pb_utils
class TritonPythonModel:
def initialize(self, args):
....
def execute(self, requests):
`execute` must be implemented in every Python model. `execute`
function receives a list of pb_utils.InferenceRequest as the only
argument. This function is called when an inference is requested
for this model.
Parameters
requests : list A list of pb_utils.InferenceRequest
Returns : list of pb_utils.InferenceResponse. The length of this list must be the same as `requests`