Falcon fine tuning running out of memory

0

Hi, I am trying to fine tune falcon 40b on amazon saegmaker but I keep running out of memory. I have tried using the following instances: P3.16x, 24xlarge, 12xlarge. Has anyone faced this error before, and how have you resolved it? Best,

vs
demandé il y a 9 mois336 vues
2 réponses
0

Hi, I tried following the blog and noetbook linked above, and got "no space left on device error" with a ml.g5.12xlarge instance. What should I do?

répondu il y a 9 mois
  • I would suggest creating a Support case because that way the Support engineer can look into the specific issue in a more fine grained manner. In general there are multiple reasons why OOM error Might occur.

    • Dataset sharding
    • Model sharding size is big
    • Model is not egtting quantized
    • Input/output mode for the job
    • Model size itself while uploading.

    Try granularizing the Model a bit more and use a better strategy for checkpointing. Use techniques like Shared Data Parallelism.

    If all of this fails submit a support ticket so the Team can look into this.

0

Hello,

I understand that you encountered OOM error while fine-tuning falcon 40b on SageMaker using the following instances: P3.16x, 24xlarge and 12xlarge.

In the following blog post [1] and notebook example [2], a ml.g5.12xlarge instance was utilized to fine-tune Falcon-40B; could you kindly try with the same instance or choose a larger instance from a family of ml.g5 instances [1]. And for larger models kindly try using ml.p4d, ml.p4de and ml.inf1 instances.

To request a service quota increase for instances, on the AWS Service Quotas console, navigate to AWS services, Amazon SageMaker, and select Studio KernelGateway Apps running on ml.g5.12xlarge instances.

Reference

[1] https://aws.amazon.com/blogs/machine-learning/interactively-fine-tune-falcon-40b-and-other-llms-on-amazon-sagemaker-studio-notebooks-using-qlora/ [2] https://github.com/aws-samples/amazon-sagemaker-generativeai/blob/main/studio-notebook-fine-tuning/falcon-40b-qlora-finetune-summarize.ipynb [3] https://aws.amazon.com/sagemaker/pricing/

AWS
répondu il y a 9 mois

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions