Host multiple TensorFlow computer vision models using Amazon SageMaker multi-model endpoints

0

Hi All,

Greetings!!

Could you please clarify on two questions below which are related to this post.

https://aws.amazon.com/blogs/machine-learning/host-multiple-tensorflow-computer-vision-models-using-amazon-sagemaker-multi-model-endpoints/

  1. As per AWS documentation (https://docs.aws.amazon.com...] "Multi-model endpoints are not supported on GPU instance types". I see we are using 'instance_type': 'ml.m5.2xlarge' to train both image classification and sign language digit classification models in this post.

As per instance configuration we don't have any GPU cores and GPU memory available in 'ml.m5.2xlarge' instance type. But we all know that we need a GPU instance for deep learning model training, we are not getting how we are able to train both classification models using 'ml.m5.2xlarge' instance type.

Does this image - IMAGE_URI = '763104351884.dkr.ecr.us-eas... has GPU cores?

ml.m5.2xlarge - Instance details:

Compute Type: Standard Instances V CPU: 8 Memory: 32 GiB Clock Speed: undefined GPU: 0 Network Performance: Up to 10 Gigabit Storage: EBS only GPU Memory: 0

  1. As mentioned in this post, I believe we have one production variant of both CIFAR and sign-language models meaning we have v1 models. Let's assume that we have a new set of images for classification then we have decided to train a new variant CIFAR model and want to create that model.

So we have 2 versions of the CIFAR model and 1 version of the sign-language model.

How can we get the inference from two production variants of CIFAR models? How can we do A/B testing for CIFAR models?

Thanks in advance.

asked 10 months ago115 views
3 Answers
0

Hi Vinayak & hope I can help:

Let's take (1) GPU vs CPU first:

  • Yes, you're correct that Multi-Model Endpoints don't currently support GPU instance types (as documented here, and a lot of ML frameworks automatically reserve full GPU capacity which would make this tricky in practice anyway).
  • I see from over on the blog post that the full container image URI you linked (which got chopped off here on rePost) was: 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.3.1-cpu-py37-ubuntu18.04 - which is a -cpu- image so is set up for CPU-only instance types - excluding GPU tools like CUDA, etc.
  • From the notebook of this sample that it's actually training the model on CPU-only instances too (using ml.m5.2xlarge in the Estimator constructor)

Training small DL models on CPU only, without GPU acceleration, is perfectly possible... Just often much slower because of the nature of the workload - especially as models and datasets get bigger. It should also be fine (for faster training speed) to train on GPU-accelerated instances (setting a different instance type in the Estimator) but then deploy for CPU-only inference (keeping m5 or similar in the MultiDataModel).

Generally this could be as easy as just changing the instance types in the notebook, but there's a couple of subtle things you might need to bear in mind if you see problems:

  • Your SageMaker "Model" will reference a container image, which will be optimized for either CPU or GPU instances. This is why the SageMaker Python SDK Estimator.fit() API that runs a "Training Job" doesn't automatically register the output in the SageMaker "Models" list - it doesn't know whether the image will be the same or different until you define target infrastructure (e.g. with mme.deploy() in the notebook). I think you'll see that even creating the mme MultiDataModel in the notebook doesn't create a "Model" in SageMaker console until that deploy call.
  • If you're providing any custom inference scripts e.g. input_handler that reference CUDA devices or other GPU-specific tools, you'll need to check these scripts are set up to work okay on either GPU or CPU-only deployments. The example you linked is already CPU-only so that should be fine.

Now for (2) A/B testing:

Between variants, MME, multi-container endpoints, or even separate SageMaker endpoints with some proxy in front - you have multiple options for A/B testing models depending what's easiest for your overall deployment, monitoring, integration and rollback requirements.

Some tips I would suggest that might guide your choice:

  • For MCE, MME and separate-endpoints, your client logic would always need to explicitly choose one of the options/versions to invoke. Using ProductionVariants, you can either explicitly specify a Target-Variant from the client side or let the configured variant weights do their thing and weight invocations automatically.
  • MME supports a potentially large number of available models at a time (e.g. more than memory), but doesn't (yet?) offer a way to explicitly purge cached models from endpoint memory. This means you can't really overwrite the "same" model name on S3 within one MME - You'd need to add new versions in as new models on S3, and make sure your clients start requesting the new model names.
  • I believe yes it should be possible to set up an endpoint with multiple variants, each of which is a MultiModel... Each variant would run on separate infrastructure instance(s). I guess you could probably use a new variant pointing to the same S3 models folder as a way to force clear an MME cache - because you could spin up new serving containers, push traffic over to them, and shut the old variant down? But this would be a pretty niche case.
EXPERT
Alex_T
answered 10 months ago
0

@Alex_T

Thanks for your reply.

Let's talk about MME only.

As mentioned in this post, I believe we have one production variant of both CIFAR and sign-language models meaning we have v1 models. Let's assume that we have a new set of images for classification then we have decided to train a new variant CIFAR model and want to create that model. So we have 2 versions of the CIFAR model and 1 version of the sign-language model.

Both CIFAR and sign-language models v1 models will be running in one MME then new production variant (version) of CIFAR model will be running in new MME?

In general, how can we do A/B testing in MME? If possible then please clarify it with simple example.

Thanks

answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions