Can I host multiple models using Nvidia Triton Business Logic Scripting behind a Multi-Model Endpoint?

I'd like to host models which need to use BLS on GPU-backed realtime inference with MME. I'd like to be able to scale to hundreds or thousands of such models behind one endpoint. Will this work out of the box? I know there is support for this for models which don't use BLS, but the documentation is not clear on whether an entire BLS pipeline can be treated as an individual model by the auto-scaler.

Better description of my use case:

The model consists of:

Model slice A: weights selected from a set of ~5 options, called once per invocation, nothing ever gets added or removed
Model slice B: weights selected from a set of 100s+ options, called in a loop n times per invocation, variants constantly added

So if you take this model alone, it seems to be a good candidate for Triton BLS, where each of the slices is a model instance and slice B instance is called n times in a loop in the pipeline file.

My first thought was to add all the model variants to a single BLS model repo, but I am not sure if this would work with auto-scaling and frequently added new models.

The other possibility is to split it into m BLS pipelines, where m is the number of variants I have for model slice B. This should work with auto-scaling, since the model target should match up with the request destination model, but I am not sure if this is supported (an entire BLS model hierarchy being loaded and unloaded in Sagemaker).

The next best option I guess is to not use MME? And maybe switch to MCE? But this seems like a big loss in performance.

What is the best way to deploy this model using Sagemaker tools?

Argomenti

Apprendimento automatico e intelligenza artificiale

Tag

Amazon SageMaker

Lingua

English

rePost-User-2225686

posta un anno fa100 visualizzazioni

Nessuna risposta

Più recenti
Maggior numero di voti
Maggior numero di commenti

Contenuto pertinente

Come posso inoltrare l'header host con integrazione privata per un'API REST API Gateway?
AWS UFFICIALEAggiornata 10 mesi fa
Come posso risolvere i problemi relativi alle prestazioni di rete tra le istanze EC2 Linux o Windows in un VPC e un host on-premise tramite il gateway Internet?
AWS UFFICIALEAggiornata un anno fa
Come faccio a far migrare un host dedicato o un'istanza dedicata a un altro host dedicato?
AWS UFFICIALEAggiornata 2 anni fa
Come posso risolvere gli errori "Unable to verify secret hash for client<client-id>" ricevuti dall'API dei pool di utenti in Amazon Cognito?
AWS UFFICIALEAggiornata 3 anni fa