Pre-built AWS Solutions: Deploying Low Latency, High Throughput Inference using AWS Graviton or AWS Inferentia on Amazon EKS

1 minute read
Content level: Intermediate
0

Announcement for pre-built AWS solutions

Accelerate your product’s time to market with pre-built AWS Solutions. Now available: Guidance for Low Latency, High Throughput Inference using AWS Graviton or AWS Inferentia on Amazon EKS.

This Guidance demonstrates how to deploy a machine learning inference architecture on Amazon Elastic Kubernetes Service (Amazon EKS). It addresses the basic implementation requirements as well as ways you can pack thousands of unique PyTorch deep learning (DL) models into a scalable architecture. PyTorch is an open-source machine learning framework that can help accelerate your machine learning journey from prototyping to deployment. We also explore a mix of Amazon Elastic Compute Cloud (Amazon EC2) instance families to develop an optimal design using efficient compute (such as AWS Graviton and AWS Inferentia) that allows you to scale inferences efficiently and cost effectively.

Learn more at: https://aws.amazon.com/solutions/guidance/low-latency-high-throughput-inference-using-efficient-compute-on-amazon-eks