Can not load GPT-J6B on 32 GB instance from HuggingFace

0

I have used ml.g4dn.2xlarge instance on SageMaker to test GPT-J6B model from HuggingFace using Transformer.

I am using revision=float16 and low_cpu_mem_usage=True so that the model is only of 12GB.

It is downloaded but after that it suddenly crashes the kernel.

Please share the workaround. The memory of that instance is 32 GB wit 4 vCPU.

!pip install transformers

from transformers import AutoTokenizer, AutoModelForCasualLM

model = AutoModelForCasualLM.from_pretrained("EleutherAI/gpt-j-6B", revision="float16", low_cpu_mem_usage=True) # It crashes here
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")

It downloads 12GB model but after that, it crashes.

I tried to follow this thread here but still there I can't update the sentencepiece.

Please help. Thanks

  • Are you using SageMaker Notebooks, or Studio? I wonder if it's lack of memory, can you try with a larger instance?

  • Hello @Durga_S, I am using the SageMaker Notebooks. Actually the revision model "float16" already is of 12.1GB. So firstly I tried using with the 16GB memory setup. But that didn't work. Then as said in the question, I shifted to the ml.g4dn.2xlarge instance which has 32GB RAM and a T4 GPU with around 15GB RAM. But still I am unable to load the model. And I am not sure whether increasing the instance size will help out or not as I am already on the 32GB instance so this must be some another issue. Please assist, thanks.

  • Perhaps you can try the example here: https://github.com/marckarp/amazon-sagemaker-gptj?

1 Answer
1

Hi, can you please try again by changing the instance type from ml.g4dn.2xlarge to ml.g5.12xlarge. I am able to successfully load GPT-J 6B by following the steps mentioned in below article . https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/text-generation-few-shot-learning.ipynb

AWS
answered 7 months ago
  • Hopefully it should work, but the thing is purchasing a bigger instance should essentially solve the issue. But nevertheless now I have shifted to other libraries which provide very cheap and faster inference like VLLM and Llama.cpp.

    Thanks for your support. Still, if there is anything in the future, this thread might help somebody, someday.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions