Best practices to fine-tune large language models on Sagemaker


I would like to fine-tune large language models (starting with 10+B parameters) on Sagemaker.

Since we are working with Pytorch and Lightning the idea would be to use DeepSpeed in combination with the Lightning trainer ( The starting point will be fine-tuning some public model with the PEFT library from Huggingface.

I also saw an article about Trainium but that does not seem to work out-of-the-box and require changes to the model, which will prevent us from using publicly available per-trained models (if I am not mistaken).

I would like to know if there are resources on the topic, in particular:

  • guidance about different ways to leverage multi-GPU
  • example of successful use-cases
  • guidance about cost/performance trade-offs

Hi, Did you have a look at this blog: It highlights your topics related to training parallelism. Another blog on the topic is

On successful use case of training model on AWS, you can look at BloombergGPT, a 50 billion parameter language model that supports a wide range of tasks within the financial industry:

If you are using PEFT, you would not need to fine-tune complete 10Billion parameters, so with above best practices, I think you would be able to achieve fine-tuning task.

回答済み 10ヶ月前

ログインしていません。 ログイン 回答を投稿する。