1 Answer
- Newest
- Most votes
- Most comments
0
Hi,
When deploying large models, the ideal situation is that you can fit the model on a single GPU. This is the best option with respect to performance as it eliminates the overhead of communication between GPU devices. For some models it is simply impossible to fit them on a single GPU due to model size. For other models, they may fit on a single GPU, but it may be more cost effective to partition the model across multiple cheaper GPUs.
In your case, you may consider an instance like g4dn.xlarge
if using for inference. Details for using GPU can refer official document.
Hope it helps.
answered a year ago
Relevant content
- Accepted Answerasked 8 days ago
- asked 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 3 months ago