Help with Inference Script for Amazon Sagemaker Neo Compiled Models

0

Hello everyone, I was trying to execute the example mentioned in the docs - https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_neo_compilation_jobs/pytorch_torchvision/pytorch_torchvision_neo.html. I was able to successfully run this example but as soon as I changed the target_device to jetson_tx2, after which I ran the entire script again, keeping the rest of the code as it is, the model stopped working. I was not getting any inferences from the deployed model and it always errors out with the message:

An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from <users-sagemaker-endpoint> with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again."                

According to the troubleshoot docs https://docs.aws.amazon.com/sagemaker/latest/dg/neo-troubleshooting-inference.html, this seems to be an issue of model_fn() function. The inference script used by this example is mentioned here https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker_neo_compilation_jobs/pytorch_torchvision/code/resnet18.py , which itself doesn't contain any model_fn() definition but it still worked for target device ml_c5. So could anyone please help me with the following questions:

  1. What changes does SageMaker Neo do to the model depending on target_device type? Since it seems the same model is loaded in a different way for different target device.
  2. Is there any way to determine how the model is expected to load for a certain target_device type so that I could define the model_fn() function myself in the same inference script mentioned above?
  3. At-last, can anyone please help with the inference script for this very same model as mentioned in docs above which works for jetson_tx2 device as well.

Any suggestions or links on how to resolve this issue would be really helpful.

Rupesh
asked 20 days ago33 views
1 Answer
1
Accepted Answer

As you mentioned, you changed the Neo compiling target from ml_c5 to jetson_tx2, the compiled model will require runtime from jetson_tx2. If you kept other code unchanged, the model will be deployed to a ml.c5.9xlarge EC2 instance, which doesn't provide Nvida Jeston.

The model can't be loaded and will error out since Jestion is a device Nvidia GPU structure while c5 is only equipped with CPU. No CUDA environment.

If you compile the model with jeston_tx2 as target, you should download the model and run the compiled model in a real Nvidia Jeston device.

answered 16 days ago
  • It looks like I overlooked where the model was actually being deployed. Thanks a lot for pointing it out.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions