Help with Inference Script for Amazon Sagemaker Neo Compiled Models

Question

Hello everyone, I was trying to execute the example mentioned in the docs - [https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_neo_compilation_jobs/pytorch_torchvision/pytorch_torchvision_neo.html]().
I was able to successfully run this example but as soon as I changed the target_device  to `jetson_tx2`, after which I ran the entire script again, keeping the rest of the code as it is, the model stopped working. I was not getting any inferences from the deployed model and it always errors out with the message:

```
An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from  with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again."                
```
According to the troubleshoot docs [https://docs.aws.amazon.com/sagemaker/latest/dg/neo-troubleshooting-inference.html](), this seems to be an issue of **model_fn**() function.
The inference script used by this example is mentioned here [https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker_neo_compilation_jobs/pytorch_torchvision/code/resnet18.py]() , which itself doesn't contain any model_fn() definition but it still worked for target device `ml_c5`.
So could anyone please help me with the following questions:
1. What changes does SageMaker Neo do to the model depending on `target_device` type? Since it seems the same model is loaded in a different way for different target device.
2. Is there any way to determine how the model is expected to load for a certain target_device type so that I could define the **model_fn**() function myself in the same inference script mentioned above?
3. At-last, can anyone please help with the inference script for this very same model as mentioned in docs above which works for `jetson_tx2` device as well.

Any suggestions or links on how to resolve this issue would be really helpful.

Accepted Answer

As you mentioned, you changed the Neo compiling target from `ml_c5` to `jetson_tx2`, the compiled model will require runtime from `jetson_tx2`. If you kept other code unchanged, the model will be deployed to a `ml.c5.9xlarge` EC2 instance, which doesn't provide Nvida Jeston.

The model can't be loaded and will error out since Jestion is a device Nvidia GPU structure while c5 is only equipped with CPU. No CUDA environment.

If you compile the model with `jeston_tx2` as target, you should download the model and run the compiled model in a real Nvidia Jeston device.

Help with Inference Script for Amazon Sagemaker Neo Compiled Models

Relevanter Inhalt