Falcon fine tuning running out of memory

0

Hi, I am trying to fine tune falcon 40b on amazon saegmaker but I keep running out of memory. I have tried using the following instances: P3.16x, 24xlarge, 12xlarge. Has anyone faced this error before, and how have you resolved it? Best,

vs
gefragt vor 9 Monaten336 Aufrufe
2 Antworten
0

Hi, I tried following the blog and noetbook linked above, and got "no space left on device error" with a ml.g5.12xlarge instance. What should I do?

beantwortet vor 9 Monaten
  • I would suggest creating a Support case because that way the Support engineer can look into the specific issue in a more fine grained manner. In general there are multiple reasons why OOM error Might occur.

    • Dataset sharding
    • Model sharding size is big
    • Model is not egtting quantized
    • Input/output mode for the job
    • Model size itself while uploading.

    Try granularizing the Model a bit more and use a better strategy for checkpointing. Use techniques like Shared Data Parallelism.

    If all of this fails submit a support ticket so the Team can look into this.

0

Hello,

I understand that you encountered OOM error while fine-tuning falcon 40b on SageMaker using the following instances: P3.16x, 24xlarge and 12xlarge.

In the following blog post [1] and notebook example [2], a ml.g5.12xlarge instance was utilized to fine-tune Falcon-40B; could you kindly try with the same instance or choose a larger instance from a family of ml.g5 instances [1]. And for larger models kindly try using ml.p4d, ml.p4de and ml.inf1 instances.

To request a service quota increase for instances, on the AWS Service Quotas console, navigate to AWS services, Amazon SageMaker, and select Studio KernelGateway Apps running on ml.g5.12xlarge instances.

Reference

[1] https://aws.amazon.com/blogs/machine-learning/interactively-fine-tune-falcon-40b-and-other-llms-on-amazon-sagemaker-studio-notebooks-using-qlora/ [2] https://github.com/aws-samples/amazon-sagemaker-generativeai/blob/main/studio-notebook-fine-tuning/falcon-40b-qlora-finetune-summarize.ipynb [3] https://aws.amazon.com/sagemaker/pricing/

AWS
beantwortet vor 9 Monaten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen