deployment of custom model to device failed

0

Hi, I'm trying to deploy a custom model trained with SageMaker to my DeepLens device. The model is based on MXNet Resnet50 and makes good predictions when deployed on on a sagemaker endpoint. However when deploying to DeepLens we seem to be getting errors when the lambda function tries to optimize the model. No inferences are made by the device.
The lambda log shows this (errors reported for -mo.py 161/173 - what are these?):

[2020-02-09T19:00:28.525+02:00][ERROR]-mo.py:161,
[2020-02-09T19:00:30.088+02:00][ERROR]-mo.py:173,
[2020-02-09T19:00:30.088+02:00][INFO]-IoTDataPlane.py:115,Publishing message on topic "$aws/things/deeplens_rFCSPJQhTGS5Y9NGkJIz8g/infer" with Payload "Loading action cat-dog model"
[2020-02-09T19:00:30.088+02:00][INFO]-Lambda.py:92,Invoking Lambda function "arn:aws:lambda:::function:GGRouter" with Greengrass Message "Loading action cat-dog model"
[2020-02-09T19:00:30.088+02:00][INFO]-ipc_client.py:142,Posting work for function [arn:aws:lambda:::function:GGRouter] to http://localhost:8000/2016-11-01/functions/arn:aws:lambda:::function:GGRouter
[2020-02-09T19:00:30.099+02:00][INFO]-ipc_client.py:155,Work posted with invocation id [158058ef-7386-49c5-791a-9c61bd1b9951]
[2020-02-09T19:00:30.109+02:00][INFO]-IoTDataPlane.py:115,Publishing message on topic "$aws/things/deeplens_rFCSPJQhTGS5Y9NGkJIz8g/infer" with Payload "Error in cat-dog lambda: Model path  is invalid"
[2020-02-09T19:00:30.11+02:00][INFO]-Lambda.py:92,Invoking Lambda function "arn:aws:lambda:::function:GGRouter" with Greengrass Message "Error in cat-dog lambda: Model path  is invalid"

It seems to me that the model optimizer is failing for some reason and not producing the optimized output, however we cannot understand the errors, is there somewhere we can decipher these?
BTW, device is installed with all latest updates and MXNet version on device is 1.4.0
Many thanks

Edited by: Mike9753 on Feb 10, 2020 1:02 AM

질문됨 4년 전257회 조회
2개 답변
0

It seems that this problem has to do with the mxnet version installed on the device not matching the mxnet version used by sagemaker to train the model.
Upon updating the mxnet version on the device to 1.2.0 or 1.1.0 model optimization seems to work fine and the device starts producing inferences.

sudo pip3 install mxnet==1.2.0

The mxnet version used by sagemaker can be defined by the framework_version parameter in sagemaker.mxnet.estimator. When not specified, as in my case, should default to 1.2.1, however when installing mxnet 1.2.1 on my device the same errors occured. [When installing 1.2.0 it seems to work fine though]

답변함 4년 전
0

It would be nice if the logs would specify something meaningful, say "incompatible mxnet versions in model and device"

Edited by: Mike9753 on Feb 26, 2020 6:07 AM

답변함 4년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠