Best practices to fine-tune large language models on Sagemaker

1

I would like to fine-tune large language models (starting with 10+B parameters) on Sagemaker.

Since we are working with Pytorch and Lightning the idea would be to use DeepSpeed in combination with the Lightning trainer (https://lightning.ai/docs/pytorch/stable/advanced/model_parallel.html). The starting point will be fine-tuning some public model with the PEFT library from Huggingface.

I also saw an article about Trainium but that does not seem to work out-of-the-box and require changes to the model, which will prevent us from using publicly available per-trained models (if I am not mistaken).

I would like to know if there are resources on the topic, in particular:

  • guidance about different ways to leverage multi-GPU
  • example of successful use-cases
  • guidance about cost/performance trade-offs
1개 답변
0

Hi, Did you have a look at this blog: https://aws.amazon.com/blogs/machine-learning/training-large-language-models-on-amazon-sagemaker-best-practices/ It highlights your topics related to training parallelism. Another blog on the topic is https://aws.amazon.com/blogs/machine-learning/train-175-billion-parameter-nlp-models-with-model-parallel-additions-and-hugging-face-on-amazon-sagemaker/

On successful use case of training model on AWS, you can look at BloombergGPT, a 50 billion parameter language model that supports a wide range of tasks within the financial industry: https://arxiv.org/pdf/2303.17564.pdf

If you are using PEFT, you would not need to fine-tune complete 10Billion parameters, so with above best practices, I think you would be able to achieve fine-tuning task.

답변함 10달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠