I have found the solution to make this happen.
In the app.py file and in the file where you run the
torch.jit.load(assuming it is different from your app.py file) , set the following parameters:
import os os.environ['NEURON_RT_NUM_CORES'] = '1'
This tells each child process in gunicorn to use 1 neuron core each and hence you can run X number of workers where X is the number of cores that are on the device you are running your code on.
Inferentia is compatible with FastAPI. The error suggests the program is asking to allocate more cores than available. As an example, lets assume the instance is inf1.6xl, which has 16 Neuron Cores. Below should be your
gunicorn main-fastapi-demo:app —workers 4 —worker-class uvicorn.workers.UvicornWorker —bind 0.0.0.0:8001
and in your server code
main-fastapi-demo.py, make sure you add environment variable after your import statements:
NUM_CORES = 4 os.environ['NEURON_RT_NUM_CORES'] = str(NUM_CORES)
Taken together, this means that you will invoke four gunicorn workers, each worker gets four Neuron Cores. So total of (4 x 4 = 16) 16 Neuron Cores are allocated to your server process.
You may mix and match these parameters, it doesn't have to be 4 x 4, it may be 8 x 2, 2 x 8. The best combination is determined by benchmarking effort.
Running Inferentia Models via Gunicornasked a month ago
Host multiple TensorFlow computer vision models using Amazon SageMaker multi-model endpointsasked 8 months ago
Do ml.inf machines support multi-model endpoints?asked 7 months ago
How can I use serverless inference with models generated by SageMaker Autopilot?asked a month ago
What is a practical Inferentia limit to model size?asked 12 days ago
Can you use one endpoint to host multiple AWS Comprehend custom models?asked 10 hours ago
SageMaker with multiple modelsAccepted Answerasked 2 years ago
Can I load in DeepLens two models at the same timeasked 2 years ago
Single shared material for multiple models?Accepted Answerasked 5 years ago
How to export tresained models to ECR as container imageAccepted Answerasked 12 days ago