Best practices to fine-tune large language models on Sagemaker


I would like to fine-tune large language models (starting with 10+B parameters) on Sagemaker.

Since we are working with Pytorch and Lightning the idea would be to use DeepSpeed in combination with the Lightning trainer ( The starting point will be fine-tuning some public model with the PEFT library from Huggingface.

I also saw an article about Trainium but that does not seem to work out-of-the-box and require changes to the model, which will prevent us from using publicly available per-trained models (if I am not mistaken).

I would like to know if there are resources on the topic, in particular:

  • guidance about different ways to leverage multi-GPU
  • example of successful use-cases
  • guidance about cost/performance trade-offs
1 Antwort

Hi, Did you have a look at this blog: It highlights your topics related to training parallelism. Another blog on the topic is

On successful use case of training model on AWS, you can look at BloombergGPT, a 50 billion parameter language model that supports a wide range of tasks within the financial industry:

If you are using PEFT, you would not need to fine-tune complete 10Billion parameters, so with above best practices, I think you would be able to achieve fine-tuning task.

beantwortet vor 10 Monaten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen