deployment of custom model to device failed

0

Hi, I'm trying to deploy a custom model trained with SageMaker to my DeepLens device. The model is based on MXNet Resnet50 and makes good predictions when deployed on on a sagemaker endpoint. However when deploying to DeepLens we seem to be getting errors when the lambda function tries to optimize the model. No inferences are made by the device.
The lambda log shows this (errors reported for -mo.py 161/173 - what are these?):

[2020-02-09T19:00:28.525+02:00][ERROR]-mo.py:161,
[2020-02-09T19:00:30.088+02:00][ERROR]-mo.py:173,
[2020-02-09T19:00:30.088+02:00][INFO]-IoTDataPlane.py:115,Publishing message on topic "$aws/things/deeplens_rFCSPJQhTGS5Y9NGkJIz8g/infer" with Payload "Loading action cat-dog model"
[2020-02-09T19:00:30.088+02:00][INFO]-Lambda.py:92,Invoking Lambda function "arn:aws:lambda:::function:GGRouter" with Greengrass Message "Loading action cat-dog model"
[2020-02-09T19:00:30.088+02:00][INFO]-ipc_client.py:142,Posting work for function [arn:aws:lambda:::function:GGRouter] to http://localhost:8000/2016-11-01/functions/arn:aws:lambda:::function:GGRouter
[2020-02-09T19:00:30.099+02:00][INFO]-ipc_client.py:155,Work posted with invocation id [158058ef-7386-49c5-791a-9c61bd1b9951]
[2020-02-09T19:00:30.109+02:00][INFO]-IoTDataPlane.py:115,Publishing message on topic "$aws/things/deeplens_rFCSPJQhTGS5Y9NGkJIz8g/infer" with Payload "Error in cat-dog lambda: Model path  is invalid"
[2020-02-09T19:00:30.11+02:00][INFO]-Lambda.py:92,Invoking Lambda function "arn:aws:lambda:::function:GGRouter" with Greengrass Message "Error in cat-dog lambda: Model path  is invalid"

It seems to me that the model optimizer is failing for some reason and not producing the optimized output, however we cannot understand the errors, is there somewhere we can decipher these?
BTW, device is installed with all latest updates and MXNet version on device is 1.4.0
Many thanks

Edited by: Mike9753 on Feb 10, 2020 1:02 AM

posta 4 anni fa257 visualizzazioni
2 Risposte
0

It seems that this problem has to do with the mxnet version installed on the device not matching the mxnet version used by sagemaker to train the model.
Upon updating the mxnet version on the device to 1.2.0 or 1.1.0 model optimization seems to work fine and the device starts producing inferences.

sudo pip3 install mxnet==1.2.0

The mxnet version used by sagemaker can be defined by the framework_version parameter in sagemaker.mxnet.estimator. When not specified, as in my case, should default to 1.2.1, however when installing mxnet 1.2.1 on my device the same errors occured. [When installing 1.2.0 it seems to work fine though]

con risposta 4 anni fa
0

It would be nice if the logs would specify something meaningful, say "incompatible mxnet versions in model and device"

Edited by: Mike9753 on Feb 26, 2020 6:07 AM

con risposta 4 anni fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande