1 個回答
- 最新
- 最多得票
- 最多評論
0
It’s my understanding that you’ll need to merge them into a single pytorch_model.bin
I don’t recall my original source, but I had this saved in my notes, so please make any necessary changes for your project.
- Create a new directory and copy all the bin files of your model into it.
- Install the
torch
library if you haven't already done so.pip install torch
- Use the following Python code to merge the bin files into one:
import torch bin_files_path = "path/to/your/bin/files/directory" output_path = "path/to/output/merged/bin/file/pytorch_model.bin" # Create an empty state dictionary state_dict = {} device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Load the weights from each bin file and merge them into the state dictionary for bin_file in sorted(os.listdir(bin_files_path)): if bin_file.endswith(".bin"): model_state_dict = torch.load(os.path.join(bin_files_path, bin_file), map_location=device) state_dict.update(model_state_dict) # Save the merged state dictionary as a single bin file torch.save(state_dict, output_path)
After merging the bin files into one, you should be able to deploy your model into SageMaker using the merged pytorch_model.bin
file.
已回答 10 個月前
相關內容
- 已提問 10 個月前
- AWS 官方已更新 2 年前
- AWS 官方已更新 1 年前
- AWS 官方已更新 9 個月前
- AWS 官方已更新 10 個月前
I passed this error. Sagemaker will actually do the conversion for me. But I need to give it more time.
Set up the container_startup_health_check_timeout to a bigger number and it will pass this error.
But I encountered the next error
I upgraded to a bigger instance type, and played with param PYTORCH_CUDA_ALLOC_CONF but the error persisted.