Best practices to fine-tune large language models on Sagemaker

1

I would like to fine-tune large language models (starting with 10+B parameters) on Sagemaker.

Since we are working with Pytorch and Lightning the idea would be to use DeepSpeed in combination with the Lightning trainer (https://lightning.ai/docs/pytorch/stable/advanced/model_parallel.html). The starting point will be fine-tuning some public model with the PEFT library from Huggingface.

I also saw an article about Trainium but that does not seem to work out-of-the-box and require changes to the model, which will prevent us from using publicly available per-trained models (if I am not mistaken).

I would like to know if there are resources on the topic, in particular:

  • guidance about different ways to leverage multi-GPU
  • example of successful use-cases
  • guidance about cost/performance trade-offs
1回答
0

Hi, Did you have a look at this blog: https://aws.amazon.com/blogs/machine-learning/training-large-language-models-on-amazon-sagemaker-best-practices/ It highlights your topics related to training parallelism. Another blog on the topic is https://aws.amazon.com/blogs/machine-learning/train-175-billion-parameter-nlp-models-with-model-parallel-additions-and-hugging-face-on-amazon-sagemaker/

On successful use case of training model on AWS, you can look at BloombergGPT, a 50 billion parameter language model that supports a wide range of tasks within the financial industry: https://arxiv.org/pdf/2303.17564.pdf

If you are using PEFT, you would not need to fine-tune complete 10Billion parameters, so with above best practices, I think you would be able to achieve fine-tuning task.

回答済み 9ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ