Deploy HuggingFace pretained model (multiple bin files) to AWS Sagemaker

0

I have been trying to deploy this HuggingFace model ( https://huggingface.co/bigcode/starcoderplus/tree/main ) to AWS Sagemaker but failed. The error message from Cloudwatch is

"No safetensors weights found for model bigcode/starcoder at revision None. Converting PyTorch weights to safetensors."

This model has 7 bins files. I believe it is trained in "model_parallel" mode and I need to merge them into one bin file before Sagemaker can deploy it. ( just my guess ) Enter image description here

I successfully deploy a BERT model from HuggingFace, of which has only one "pytorch_model.bin".

What do I need to do to successfully deploy the model with multiple bin files?

Jeremy
gefragt vor 10 Monaten921 Aufrufe
1 Antwort
0

It’s my understanding that you’ll need to merge them into a single pytorch_model.bin I don’t recall my original source, but I had this saved in my notes, so please make any necessary changes for your project.

  • Create a new directory and copy all the bin files of your model into it.
  • Install the torch library if you haven't already done so. pip install torch
  • Use the following Python code to merge the bin files into one:
import torch

bin_files_path = "path/to/your/bin/files/directory"
output_path = "path/to/output/merged/bin/file/pytorch_model.bin"

# Create an empty state dictionary
state_dict = {}
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load the weights from each bin file and merge them into the state dictionary
for bin_file in sorted(os.listdir(bin_files_path)):
    if bin_file.endswith(".bin"):
        model_state_dict = torch.load(os.path.join(bin_files_path, bin_file), map_location=device)
        state_dict.update(model_state_dict)

# Save the merged state dictionary as a single bin file
torch.save(state_dict, output_path)

After merging the bin files into one, you should be able to deploy your model into SageMaker using the merged pytorch_model.bin file.

AWS
jwsink
beantwortet vor 10 Monaten
  • I passed this error. Sagemaker will actually do the conversion for me. But I need to give it more time.

    predictor = huggingface_model.deploy(
        initial_instance_count=1,
        instance_type="ml.g5.8xlarge",
        container_startup_health_check_timeout=1200,
      ) 
    

    Set up the container_startup_health_check_timeout to a bigger number and it will pass this error.

    But I encountered the next error

    torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 288.00 MiB (GPU 0; 22.20 GiB total capacity; 19.72 GiB already allocated; 143.12 MiB free; 
    21.11 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  
    See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
    

    I upgraded to a bigger instance type, and played with param PYTORCH_CUDA_ALLOC_CONF but the error persisted.

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen