Best way to do low-latency lookups for ML model prediction caching?
I'd like to increase AWS posture on ML model prediction caching. Let's say we want to serve a lookup table upstream of a REST API that contains past predictions, and look it up at every prediction to avoid re-running through the model when a given input-output pair has already been computed. It's ok to wait several minutes between cache updates and typical TTL could be between minutes and days. What is the best way to do that in AWS?
- Use the caching functionality of API GW?
- Use a warm Lambda@Edge that locally has a lookup file with past predictions?
- Use DAX or Elasticache upstream of the model call? Is there a way to have DAX or Elasticache live at the edge?
You can use CloudFront in front of your API to cache HTTP responses according to the Cache-Control header sent by your origin. Please note that this would only work for GET/HEAD HTTP requests.
Redshift ML / SageMaker - Deploy an existing model artifact to a Redshift ClusterAccepted Answerasked a year ago
AWS AI/ML integration with Power BIAccepted Answerasked a year ago
Import Existing DynamoDB table schema to NoSQL workbench's Data modelasked 2 years ago
Best way to do low-latency lookups for ML model prediction caching?Accepted Answer
API Gateway integration with DAXasked 4 months ago
Can Sagemaker-trained models be deployed to non-Sagemaker endpoints?asked a year ago
Deploying large scale ML modelasked 4 months ago
Tracking model artifacts used in machine learningAccepted AnswerEXPERTasked a year ago
How to deploy N models on N Greengrass devices with a unique Lambda for inference logic?Accepted Answer
Amazon Fraud Detector prediction pricing ?Accepted Answer