How to clone large Hugging Face repos to EC2 using Git LFS without failures?

0

Hey all,

I'm having trouble downloading large Hugging Face repositories to EC2 instances using Git LFS. When I try to clone a repo of around 14GB, the EC2 instance times out and stops during the download. I'm reaching out to see if anyone has encountered similar problems and if there are alternative solutions I might have missed.

Problem Overview:

I'm attempting to download a large file (14GB) from a HuggingFace repository using Git LFS. The process starts as expected, but during the download, the EC2 instance times out. What's even more puzzling is that the instance becomes completely unresponsive, preventing me from rejoining the session. This isn't just a one-off; I've observed this behavior across different instance types and regions.

I've tried various instance types, from t2.micro up to t2.large, in ifferent regions. In all cases, the instance fails with a status check warning and becomes unreachable during the git clone command.

For the t2.micro instances, the status check failed due to the instance running Out Of Memory. The t2.large instance showed that the volume reached its throughput limit, causing throttling, and the network interface was dropping packets.

My Local Setup:

What surprises me is that I can download the repo seamlessly onto my laptop, which has 16GB of RAM and 512GB of memory. Given this, I would expect the similar EC2 instances to handle the task without any hitches, especially given the simplicity of the task i.e. executing a git lfs command for a 14GB file.

For reference, here are the commands I run to initiate the download:

git lfs install
git clone https://huggingface.co/meta-llama/Llama-2-7b

End goal:

My end goal is to save these HuggingFace repositories to S3, I know I can download the repo's locally and then upload to s3 via aws s3 cp large_file s3://bucket/target_folder/ but some of these repo's are over 130GB+ so I really want to avoid downloading them locally before uploading them to s3. Therefore if anyone has an alterntive method of downloading these repo's to S3 I would also be open to hearing that.

Questions for the Community:

So in summary I'm after answer to these questions:

  1. Has anyone used Git LFS to clone HuggingFace repositories on EC2 instances and faced similar issues?
  2. Are there alternative solutions or configurations I should consider before scaling up my instances further? (e.g. limiting throughput)
  3. Is there another approach to save these hugging face repositories with large files (10-200GB) directly to s3?
1 Answer
0

Hi, You may want to try a direct transfer from Git to S2 via Mountpoint for Amazon S3, now generally available and ready for production workloads. See https://aws.amazon.com/blogs/aws/mountpoint-for-amazon-s3-generally-available-and-ready-for-production-workloads/

This way, you will make the I/Os directly to S3 (seen as a regular disk) instead of local EC2 instance disk. That may avoid you some of the troubles that you describe above.

Best,

Didier

profile pictureAWS
EXPERT
answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions