By using AWS re:Post, you agree to the AWS re:Post Terms of Use

Falcon fine tuning running out of memory

0

Hi, I am trying to fine tune falcon 40b on amazon saegmaker but I keep running out of memory. I have tried using the following instances: P3.16x, 24xlarge, 12xlarge. Has anyone faced this error before, and how have you resolved it? Best,

asked a year ago1.1K views
2 Answers
0

Hi, I tried following the blog and noetbook linked above, and got "no space left on device error" with a ml.g5.12xlarge instance. What should I do?

answered a year ago
  • I would suggest creating a Support case because that way the Support engineer can look into the specific issue in a more fine grained manner. In general there are multiple reasons why OOM error Might occur.

    • Dataset sharding
    • Model sharding size is big
    • Model is not egtting quantized
    • Input/output mode for the job
    • Model size itself while uploading.

    Try granularizing the Model a bit more and use a better strategy for checkpointing. Use techniques like Shared Data Parallelism.

    If all of this fails submit a support ticket so the Team can look into this.

0

Hello,

I understand that you encountered OOM error while fine-tuning falcon 40b on SageMaker using the following instances: P3.16x, 24xlarge and 12xlarge.

In the following blog post [1] and notebook example [2], a ml.g5.12xlarge instance was utilized to fine-tune Falcon-40B; could you kindly try with the same instance or choose a larger instance from a family of ml.g5 instances [1]. And for larger models kindly try using ml.p4d, ml.p4de and ml.inf1 instances.

To request a service quota increase for instances, on the AWS Service Quotas console, navigate to AWS services, Amazon SageMaker, and select Studio KernelGateway Apps running on ml.g5.12xlarge instances.

Reference

[1] https://aws.amazon.com/blogs/machine-learning/interactively-fine-tune-falcon-40b-and-other-llms-on-amazon-sagemaker-studio-notebooks-using-qlora/ [2] https://github.com/aws-samples/amazon-sagemaker-generativeai/blob/main/studio-notebook-fine-tuning/falcon-40b-qlora-finetune-summarize.ipynb [3] https://aws.amazon.com/sagemaker/pricing/

AWS
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions