1 Risposta
- Più recenti
- Maggior numero di voti
- Maggior numero di commenti
0
The concurrency of a real-time endpoint depends on the number of workers maintained inside your algorithm container. For each worker, a copy of the model weights need to be loaded. In other words, we need to first configure the container to maintain multiple workers and make sure there is enough CPU & GPU memory to host multiple models.
I think for stable diffusion, the officially recommended GPU memory is 10 GB and g4dn.xlarge only comes with 16, which is not sufficient for 2 models running concurrently ?
Could you please check the runtime GPU utilization as well as the configured number of workers in your container ?
con risposta un anno fa
Contenuto pertinente
- Perché il mio endpoint Amazon SageMaker entra in stato di errore quando creo o aggiorno un endpoint?AWS UFFICIALEAggiornata un anno fa
- AWS UFFICIALEAggiornata un anno fa
- AWS UFFICIALEAggiornata 2 anni fa
- AWS UFFICIALEAggiornata un anno fa
thanks, i'll check that is it possible concurrently process requests by using other instance type having more GPU memory.