Can not load GPT-J6B on 32 GB instance from HuggingFace

0

I have used ml.g4dn.2xlarge instance on SageMaker to test GPT-J6B model from HuggingFace using Transformer.

I am using revision=float16 and low_cpu_mem_usage=True so that the model is only of 12GB.

It is downloaded but after that it suddenly crashes the kernel.

Please share the workaround. The memory of that instance is 32 GB wit 4 vCPU.

!pip install transformers

from transformers import AutoTokenizer, AutoModelForCasualLM

model = AutoModelForCasualLM.from_pretrained("EleutherAI/gpt-j-6B", revision="float16", low_cpu_mem_usage=True) # It crashes here
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")

It downloads 12GB model but after that, it crashes.

I tried to follow this thread here but still there I can't update the sentencepiece.

Please help. Thanks

  • Are you using SageMaker Notebooks, or Studio? I wonder if it's lack of memory, can you try with a larger instance?

  • Hello @Durga_S, I am using the SageMaker Notebooks. Actually the revision model "float16" already is of 12.1GB. So firstly I tried using with the 16GB memory setup. But that didn't work. Then as said in the question, I shifted to the ml.g4dn.2xlarge instance which has 32GB RAM and a T4 GPU with around 15GB RAM. But still I am unable to load the model. And I am not sure whether increasing the instance size will help out or not as I am already on the 32GB instance so this must be some another issue. Please assist, thanks.

  • Perhaps you can try the example here: https://github.com/marckarp/amazon-sagemaker-gptj?

EM_User
已提問 1 年前檢視次數 323 次
1 個回答
1

Hi, can you please try again by changing the instance type from ml.g4dn.2xlarge to ml.g5.12xlarge. I am able to successfully load GPT-J 6B by following the steps mentioned in below article . https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/text-generation-few-shot-learning.ipynb

AWS
已回答 7 個月前
  • Hopefully it should work, but the thing is purchasing a bigger instance should essentially solve the issue. But nevertheless now I have shifted to other libraries which provide very cheap and faster inference like VLLM and Llama.cpp.

    Thanks for your support. Still, if there is anything in the future, this thread might help somebody, someday.

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南