Best configuration for inferencing with PyTorch models

Question

I'm trying to make a public facing web app that allows for inferencing, with probably ten or so available models to my users.  My initial thought was that I would have a front-end basic webpage, that communicates with a REST API server on an EC2 instance. But since I started planning this out a bit more, I found a lot of info about various AWS products, and they seem interesting but it's all pretty over my head.

I initially came the site because I heard about elastic inferencing.  After I researched elastic inferencing more, it seems like Amazon is encouraging people to use Inferentia2 instead. I realize that I could just do an EC2 instance, but I don't know how well that'll work for scaling if this app I'm making becomes popular. I've also read a bit about SageMaker, API Gateway, and even "serverless" options like Lambda, but I don't really know if those would integrate well with low cost inferencing products that AWS offers.

Any advice on setting this kind of thing up?

Answer

Hi, this guide may be a good starting point: https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html

This recent post may also help: https://aws.amazon.com/blogs/machine-learning/achieve-high-performance-with-lowest-cost-for-generative-ai-inference-using-aws-inferentia2-and-aws-trainium-on-amazon-sagemaker/

Best configuration for inferencing with PyTorch models

Contenido relevante