1 Answer
- Newest
- Most votes
- Most comments
2
For best performance, Ollama needs to load entire model into GPU memory. The GPU in g5.2xlarge has about 24 GB GPU memory, while Llama 3.1 8B model is about 4.9 GB in size. g4dn instance has about 16 GB GPU memory.
You can find EC2 listing and their GPU memory size at EC2 instance page under Accelerated Computing.
