Best practices to fine-tune large language models on Sagemaker


I would like to fine-tune large language models (starting with 10+B parameters) on Sagemaker.

Since we are working with Pytorch and Lightning the idea would be to use DeepSpeed in combination with the Lightning trainer ( The starting point will be fine-tuning some public model with the PEFT library from Huggingface.

I also saw an article about Trainium but that does not seem to work out-of-the-box and require changes to the model, which will prevent us from using publicly available per-trained models (if I am not mistaken).

I would like to know if there are resources on the topic, in particular:

  • guidance about different ways to leverage multi-GPU
  • example of successful use-cases
  • guidance about cost/performance trade-offs
1 Respuesta

Hi, Did you have a look at this blog: It highlights your topics related to training parallelism. Another blog on the topic is

On successful use case of training model on AWS, you can look at BloombergGPT, a 50 billion parameter language model that supports a wide range of tasks within the financial industry:

If you are using PEFT, you would not need to fine-tune complete 10Billion parameters, so with above best practices, I think you would be able to achieve fine-tuning task.

respondido hace 10 meses

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas