Deploy HuggingFace pretained model (multiple bin files) to AWS Sagemaker


I have been trying to deploy this HuggingFace model ( ) to AWS Sagemaker but failed. The error message from Cloudwatch is

"No safetensors weights found for model bigcode/starcoder at revision None. Converting PyTorch weights to safetensors."

This model has 7 bins files. I believe it is trained in "model_parallel" mode and I need to merge them into one bin file before Sagemaker can deploy it. ( just my guess ) Enter image description here

I successfully deploy a BERT model from HuggingFace, of which has only one "pytorch_model.bin".

What do I need to do to successfully deploy the model with multiple bin files?

asked a year ago948 views
1 Answer

It’s my understanding that you’ll need to merge them into a single pytorch_model.bin I don’t recall my original source, but I had this saved in my notes, so please make any necessary changes for your project.

  • Create a new directory and copy all the bin files of your model into it.
  • Install the torch library if you haven't already done so. pip install torch
  • Use the following Python code to merge the bin files into one:
import torch

bin_files_path = "path/to/your/bin/files/directory"
output_path = "path/to/output/merged/bin/file/pytorch_model.bin"

# Create an empty state dictionary
state_dict = {}
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load the weights from each bin file and merge them into the state dictionary
for bin_file in sorted(os.listdir(bin_files_path)):
    if bin_file.endswith(".bin"):
        model_state_dict = torch.load(os.path.join(bin_files_path, bin_file), map_location=device)

# Save the merged state dictionary as a single bin file, output_path)

After merging the bin files into one, you should be able to deploy your model into SageMaker using the merged pytorch_model.bin file.

answered a year ago
  • I passed this error. Sagemaker will actually do the conversion for me. But I need to give it more time.

    predictor = huggingface_model.deploy(

    Set up the container_startup_health_check_timeout to a bigger number and it will pass this error.

    But I encountered the next error

    torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 288.00 MiB (GPU 0; 22.20 GiB total capacity; 19.72 GiB already allocated; 143.12 MiB free; 
    21.11 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  
    See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

    I upgraded to a bigger instance type, and played with param PYTORCH_CUDA_ALLOC_CONF but the error persisted.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions