Falcon fine tuning running out of memory

0

Hi, I am trying to fine tune falcon 40b on amazon saegmaker but I keep running out of memory. I have tried using the following instances: P3.16x, 24xlarge, 12xlarge. Has anyone faced this error before, and how have you resolved it? Best,

vs
질문됨 9달 전336회 조회
2개 답변
0

Hi, I tried following the blog and noetbook linked above, and got "no space left on device error" with a ml.g5.12xlarge instance. What should I do?

답변함 9달 전
  • I would suggest creating a Support case because that way the Support engineer can look into the specific issue in a more fine grained manner. In general there are multiple reasons why OOM error Might occur.

    • Dataset sharding
    • Model sharding size is big
    • Model is not egtting quantized
    • Input/output mode for the job
    • Model size itself while uploading.

    Try granularizing the Model a bit more and use a better strategy for checkpointing. Use techniques like Shared Data Parallelism.

    If all of this fails submit a support ticket so the Team can look into this.

0

Hello,

I understand that you encountered OOM error while fine-tuning falcon 40b on SageMaker using the following instances: P3.16x, 24xlarge and 12xlarge.

In the following blog post [1] and notebook example [2], a ml.g5.12xlarge instance was utilized to fine-tune Falcon-40B; could you kindly try with the same instance or choose a larger instance from a family of ml.g5 instances [1]. And for larger models kindly try using ml.p4d, ml.p4de and ml.inf1 instances.

To request a service quota increase for instances, on the AWS Service Quotas console, navigate to AWS services, Amazon SageMaker, and select Studio KernelGateway Apps running on ml.g5.12xlarge instances.

Reference

[1] https://aws.amazon.com/blogs/machine-learning/interactively-fine-tune-falcon-40b-and-other-llms-on-amazon-sagemaker-studio-notebooks-using-qlora/ [2] https://github.com/aws-samples/amazon-sagemaker-generativeai/blob/main/studio-notebook-fine-tuning/falcon-40b-qlora-finetune-summarize.ipynb [3] https://aws.amazon.com/sagemaker/pricing/

AWS
답변함 9달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠