Here is a blog post about adding a API Gateway in front of a SageMaker endpoint: https://aws.amazon.com/blogs/machine-learning/creating-a-machine-learning-powered-rest-api-with-amazon-api-gateway-mapping-templates-and-amazon-sagemaker/ How long does your model take for inference right now? if your model is slower than you expected, you might want to choose a larger instance type or use a GPU instance for models that can use GPU. Take a look at https://github.com/aws-samples/aws-marketplace-machine-learning/blob/master/right_size_your_sagemaker_endpoints/Right-sizing%20your%20Amazon%20SageMaker%20Endpoints.ipynb
You can also consider using async inference. When the API Gateway receives a request, trigger a async inference job and return immediately. Then let the endpoint write the result to a S3 bucket, then notify your user either by SNS -> Email or through a polling API etc.
- Accepted Answerasked 8 months ago
- asked a month ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 10 months ago
- AWS OFFICIALUpdated a year ago
- How to figure out whether NAT Gateway processing charge is due to internet bound traffic or within AWS?EXPERTpublished 6 months ago