- Newest
- Most votes
- Most comments
Hi, Did you have a look at this blog: https://aws.amazon.com/blogs/machine-learning/training-large-language-models-on-amazon-sagemaker-best-practices/ It highlights your topics related to training parallelism. Another blog on the topic is https://aws.amazon.com/blogs/machine-learning/train-175-billion-parameter-nlp-models-with-model-parallel-additions-and-hugging-face-on-amazon-sagemaker/
On successful use case of training model on AWS, you can look at BloombergGPT, a 50 billion parameter language model that supports a wide range of tasks within the financial industry: https://arxiv.org/pdf/2303.17564.pdf
If you are using PEFT, you would not need to fine-tune complete 10Billion parameters, so with above best practices, I think you would be able to achieve fine-tuning task.
Relevant content
- asked 8 months ago
- Accepted Answerasked 3 months ago
- asked a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 3 months ago
There is a guide for large language model training with GPT-J here, but note that this is for 6B parameters, not 10B+. For further info on SageMaker Model Parallel (SMP), see https://www.amazon.science/blog/scaling-to-trillion-parameter-model-training-on-aws. In the case of Trainium, I'm not sure if there are examples for SageMaker but there is one example on EKS, and one example on AWS ParallelCluster (Github).