1 Answer
- Newest
- Most votes
- Most comments
0
Let me help you understand this situation:
-
The EFA (Elastic Fabric Adapter) with NCCL (NVIDIA Collective Communications Library) is indeed commonly used for distributed machine learning workloads.
-
You're correct that g4dn.8xlarge is one of the lowest-cost GPU instances that supports EFA, and it requires a quota of 32 vCPUs.
To resolve your situation, you have several options:
-
Request a quota increase specifically for g4dn instances:
- Go to Service Quotas console in AWS
- Search for "g4dn"
- Request an increase for "Running On-Demand G4dn instances"
- Provide a business justification explaining your use case
- AWS usually responds within 24-48 hours
-
Alternative approaches while waiting for quota increase:
- Start with smaller non-EFA instances to develop and test your code
- Consider using Spot instances (if your workload can handle interruptions)
- Try a different AWS region where you might have higher quotas
-
If quota increase is denied:
- Provide more detailed business case in a new request
- Contact AWS Support for guidance
- Consider using AWS Partner Network (APN) to help with quota increases
Remember that new AWS accounts typically start with lower quotas for security reasons, but these can be increased with proper justification.
answered 8 months ago
Relevant content
- asked 3 years ago
- AWS OFFICIALUpdated 5 months ago
- AWS OFFICIALUpdated 6 months ago
- AWS OFFICIALUpdated 4 months ago
