- Newest
- Most votes
- Most comments
I apologize for the frustration you're experiencing with this error. The "No space left on device" error you're encountering is not likely related to the overall storage capacity of your instance, but rather to how the space is being utilized during the training process.
This error typically occurs when a specific filesystem or partition runs out of space, even if there's plenty of storage available elsewhere on the system. In your case, it appears that the root filesystem (/) is filling up during the training process.
Here are a few potential solutions to consider:
-
Check disk usage: Run "df -h" to see which partitions are full. It's possible that temporary files or logs are filling up a specific partition.
-
Clean up unnecessary files: Remove any large, unnecessary files or old Docker images/containers that might be taking up space.
-
Adjust partition sizes: If possible, resize partitions to allocate more space to the root filesystem.
-
Use a different directory: Try changing the working directory for your training process to a partition with more available space, such as /home or /tmp.
-
Monitor disk usage during training: Add logging to track disk usage as the training progresses to identify where the space is being consumed.
-
Optimize your training pipeline: Look for ways to reduce disk usage in your training process, such as processing data in smaller batches or cleaning up temporary files more aggressively.
-
Consider using Amazon EFS: If the issue persists, you might want to consider using Amazon Elastic File System (EFS) for your training data and output, which provides scalable storage separate from your instance's local storage.
If none of these solutions work, you may need to reach out to AWS support for more detailed troubleshooting specific to your setup and the Llama 3.1 8B model training process.
Sources
AWS CodeBuild Failed with Error: no space left on device | AWS re:Post
"Cannot write: No space left on device" error even though there is storage available | AWS re:Post
No space left on device when deploy ECS Fargate Service with CDK | AWS re:Post
Relevant content
- asked 5 years ago
- asked a month ago
- asked 7 years ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 7 months ago
- AWS OFFICIALUpdated 2 years ago