deployment of custom model to device failed

Question

Hi, I'm trying to deploy a custom model trained with SageMaker to my DeepLens device. The model is based on MXNet Resnet50 and makes good predictions when deployed on on a sagemaker endpoint.  However when deploying to DeepLens we seem to be getting errors when the lambda function tries to optimize the model. No inferences are made by the device.   
The lambda log shows this (errors reported for -mo.py 161/173 - what are these?):

```
[2020-02-09T19:00:28.525+02:00][ERROR]-mo.py:161,
[2020-02-09T19:00:30.088+02:00][ERROR]-mo.py:173,
[2020-02-09T19:00:30.088+02:00][INFO]-IoTDataPlane.py:115,Publishing message on topic "$aws/things/deeplens_rFCSPJQhTGS5Y9NGkJIz8g/infer" with Payload "Loading action cat-dog model"
[2020-02-09T19:00:30.088+02:00][INFO]-Lambda.py:92,Invoking Lambda function "arn:aws:lambda:::function:GGRouter" with Greengrass Message "Loading action cat-dog model"
[2020-02-09T19:00:30.088+02:00][INFO]-ipc_client.py:142,Posting work for function [arn:aws:lambda:::function:GGRouter] to http://localhost:8000/2016-11-01/functions/arn:aws:lambda:::function:GGRouter
[2020-02-09T19:00:30.099+02:00][INFO]-ipc_client.py:155,Work posted with invocation id [158058ef-7386-49c5-791a-9c61bd1b9951]
[2020-02-09T19:00:30.109+02:00][INFO]-IoTDataPlane.py:115,Publishing message on topic "$aws/things/deeplens_rFCSPJQhTGS5Y9NGkJIz8g/infer" with Payload "Error in cat-dog lambda: Model path  is invalid"
[2020-02-09T19:00:30.11+02:00][INFO]-Lambda.py:92,Invoking Lambda function "arn:aws:lambda:::function:GGRouter" with Greengrass Message "Error in cat-dog lambda: Model path  is invalid"
```

It seems to me that the model optimizer is failing for some reason and not producing the optimized output, however we cannot understand the errors, is there somewhere we can decipher these?  
BTW, device is installed with all latest updates and MXNet version on device is 1.4.0  
Many thanks  
  
Edited by: Mike9753 on Feb 10, 2020 1:02 AM

Answer

It seems that this problem has to do with the mxnet version installed on the device not matching the mxnet version used by sagemaker to train the model.  
Upon updating the mxnet version on the device to 1.2.0 or 1.1.0 model optimization seems to work fine and the device starts producing inferences.

```
sudo pip3 install mxnet==1.2.0
```

The mxnet version used by sagemaker can be defined by the framework_version parameter in sagemaker.mxnet.estimator. When not specified, as in my case, should default to 1.2.1, however when installing mxnet 1.2.1 on my device the same errors occured.  \[When installing 1.2.0 it seems to work fine though]

Answer

It would be nice if the logs would specify something meaningful, say "incompatible mxnet versions in model and device"  
  
Edited by: Mike9753 on Feb 26, 2020 6:07 AM

deployment of custom model to device failed

Contenuto pertinente