Help with Inference Script for Amazon Sagemaker Neo Compiled Models


Hello everyone, I was trying to execute the example mentioned in the docs - I was able to successfully run this example but as soon as I changed the target_device to jetson_tx2, after which I ran the entire script again, keeping the rest of the code as it is, the model stopped working. I was not getting any inferences from the deployed model and it always errors out with the message:

An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from <users-sagemaker-endpoint> with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again."                

According to the troubleshoot docs, this seems to be an issue of model_fn() function. The inference script used by this example is mentioned here , which itself doesn't contain any model_fn() definition but it still worked for target device ml_c5. So could anyone please help me with the following questions:

  1. What changes does SageMaker Neo do to the model depending on target_device type? Since it seems the same model is loaded in a different way for different target device.
  2. Is there any way to determine how the model is expected to load for a certain target_device type so that I could define the model_fn() function myself in the same inference script mentioned above?
  3. At-last, can anyone please help with the inference script for this very same model as mentioned in docs above which works for jetson_tx2 device as well.

Any suggestions or links on how to resolve this issue would be really helpful.

asked 20 days ago33 views
1 Answer
Accepted Answer

As you mentioned, you changed the Neo compiling target from ml_c5 to jetson_tx2, the compiled model will require runtime from jetson_tx2. If you kept other code unchanged, the model will be deployed to a ml.c5.9xlarge EC2 instance, which doesn't provide Nvida Jeston.

The model can't be loaded and will error out since Jestion is a device Nvidia GPU structure while c5 is only equipped with CPU. No CUDA environment.

If you compile the model with jeston_tx2 as target, you should download the model and run the compiled model in a real Nvidia Jeston device.

answered 16 days ago
  • It looks like I overlooked where the model was actually being deployed. Thanks a lot for pointing it out.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions