Trouble deploying SageMaker trained model in DeepLens

0

Had no problem deploying sample models to DeepLens, but really struggling to get a SageMaker trained model to deploy successfully. Here is what I have done so far:

  • Extracted images from video with AWS MediaConnect
  • Cropped images to 540x540 with PIL and stored in S3
  • Ran labeling job with Mechanical Turk to draw bounding box around object
  • Ran training job in SageMaker console using Object Detection algorithm and input size 540
  • Created model in SageMaker console from training job
  • Imported model into DeepLens console
  • Copied and modified Object Detection lambda with call to mo.optimize('model_algo_1',540,540)
  • Created and deployed project to two separate DeepLens cameras

Not getting an output stream on either DeepLens. One device is about a year old. I have run all the software updates and mo.optimize steps recommended in the troubleshooting guide. The other device is brand new out of the box, updated to latest version via DeepLens console. I have not monkeyed with the new device, so all further information will pertain to the older device. I have connected a monitor and keyboard to the older device and am working in a terminal window.

When I ran mo.optimize in terminal, I got a warning about having MXNET version 1.4.0 and 1.0.0 being required, along with list range errors. I rolled back MXNET to 1.0.0 per the troubleshooting guide. I did this with both PIP and PIP3.

When I import mo in Python3, I get an import error message saying "no module named mo".

When I import mo and run mo.optimize in Python, I get an access error about the current user lacking write permission to opt/aws/artifacts. When I launch Python as sudo and run mo.optimize, I get warnings about symbols saved with MXNET 1.04, and a List Index out of Range error.

Here is my error code for mo.optimize in Python console:

aws_cam@Deepcam:~$ python
Python 2.7.12 (default, Dec  4 2017, 14:50:18) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import mo
>>> error, model_path = mo.optimize('model_algo_1',540,540)
DEBUG:mo:DLDT command: python3 /opt/awscam/intel/deeplearning_deploymenttoolkit/deployment_tools/model_optimizer/mo_mxnet.py --input_model /opt/awscam/artifacts/model_algo_1-0000.params --data_type FP16 --scale 1 --model_name model_algo_1 --output_dir /opt/awscam/artifacts --reverse_input_channels  --input_shape [1,3,540,540]
Model Optimizer arguments
	Batch: 	1
	Precision of IR: 	FP16
	Enable fusing: 	True
	Enable gfusing: 	True
	Names of input layers: 	inherited from the model
	Path to the Input Model: 	/opt/awscam/artifacts/model_algo_1-0000.params
	Input shapes: 	[1,3,540,540]
	Log level: 	ERROR
	Mean values: 	()
	IR output name: 	model_algo_1
	Names of output layers: 	inherited from the model
	Path for generated IR: 	/opt/awscam/artifacts
	Reverse input channels: 	True
	Scale factor: 	1.0
	Scale values: 	()
	Version: 	0.3.31.d8b314f6
	Prefix name for args.nd and argx.nd files: 	
	Name of pretrained model which will be merged with .nd files: 	
ERROR:mo:[ ERROR ]  Output directory /opt/awscam/artifacts is not writable for current user. For more information please refer to Model Optimizer FAQ.

Here is my error code for mo.optimize when running Python as sudo:

aws_cam@Deepcam:~$ sudo python
[sudo] password for aws_cam: 
Python 2.7.12 (default, Dec  4 2017, 14:50:18) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import mo
>>> error, model_path = mo.optimize('model_algo_1",540,540)
  File "<stdin>", line 1
    error, model_path = mo.optimize('model_algo_1",540,540)
                                                          ^
SyntaxError: EOL while scanning string literal
>>> error, model_path = mo.optimize('model_algo_1',540,540)
DEBUG:mo:DLDT command: python3 /opt/awscam/intel/deeplearning_deploymenttoolkit/deployment_tools/model_optimizer/mo_mxnet.py --input_model /opt/awscam/artifacts/model_algo_1-0000.params --data_type FP16 --scale 1 --model_name model_algo_1 --output_dir /opt/awscam/artifacts --reverse_input_channels  --input_shape [1,3,540,540]
Model Optimizer arguments
	Batch: 	1
	Precision of IR: 	FP16
	Enable fusing: 	True
	Enable gfusing: 	True
	Names of input layers: 	inherited from the model
	Path to the Input Model: 	/opt/awscam/artifacts/model_algo_1-0000.params
	Input shapes: 	[1,3,540,540]
	Log level: 	ERROR
	Mean values: 	()
	IR output name: 	model_algo_1
	Names of output layers: 	inherited from the model
	Path for generated IR: 	/opt/awscam/artifacts
	Reverse input channels: 	True
	Scale factor: 	1.0
	Scale values: 	()
	Version: 	0.3.31.d8b314f6
	Prefix name for args.nd and argx.nd files: 	
	Name of pretrained model which will be merged with .nd files: 	
ERROR:mo:[13:54:52] src/nnvm/legacy_json_util.cc:204: Warning: loading symbol saved by MXNet version 10400 with lower version of MXNet v10000. May cause undefined behavior. Please update MXNet if you encounter any issue
[13:54:52] src/nnvm/legacy_json_util.cc:204: Warning: loading symbol saved by MXNet version 10400 with lower version of MXNet v10000. May cause undefined behavior. Please update MXNet if you encounter any issue
/usr/local/lib/python3.5/dist-packages/mxnet/module/base_module.py:53: UserWarning: You created Module with Module(..., label_names=['softmax_label']) but input with name 'softmax_label' is not found in symbol.list_arguments(). Did you mean one of:
	relu4_3_scale
	data
	label
  warnings.warn(msg)
[ ERROR ]  -------------------------------------------------
[ ERROR ]  ----------------- INTERNAL ERROR ----------------
[ ERROR ]  Unexpected exception happened.
[ ERROR ]  Please contact Model Optimizer developers and forward the following information:
[ ERROR ]  list index out of range
[ ERROR ]  Traceback (most recent call last):
  File "/opt/awscam/intel/deeplearning_deploymenttoolkit/deployment_tools/model_optimizer/mo/main.py", line 222, in main
    return driver(argv)
  File "/opt/awscam/intel/deeplearning_deploymenttoolkit/deployment_tools/model_optimizer/mo/main.py", line 208, in driver
    mean_scale_values=mean_scale)
  File "/opt/awscam/intel/deeplearning_deploymenttoolkit/deployment_tools/model_optimizer/mo/pipeline/mx.py", line 80, in driver
    graph = symbol2nx(model_nodes, model_params, argv.input)
  File "/opt/awscam/intel/deeplearning_deploymenttoolkit/deployment_tools/model_optimizer/mo/front/mxnet/loader.py", line 70, in symbol2nx
    model_nodes = MxnetSsdPatternMatcher.remove_and_change_layers(model_nodes)
  File "/opt/awscam/intel/deeplearning_deploymenttoolkit/deployment_tools/model_optimizer/mo/front/mxnet/mxnet_ssd_pattern_matcher.py", line 132, in remove_and_change_layers
    MxnetSsdPatternMatcher.ssd_pattern_remove_reshape(json_layers)
  File "/opt/awscam/intel/deeplearning_deploymenttoolkit/deployment_tools/model_optimizer/mo/front/mxnet/mxnet_ssd_pattern_matcher.py", line 92, in ssd_pattern_remove_reshape
    if l2['inputs'][0][0] != i or l2['op'] != 'Reshape':
IndexError: list index out of range

[ ERROR ]  ---------------- END OF BUG REPORT --------------

Edited by: BigEd on May 17, 2019 9:37 AM

Edited by: BigEd on May 17, 2019 11:34 AM

BigEd
asked 5 years ago307 views
6 Answers
0

Possibly making some progress. Cloned incubator-mxnet to home directory and was able to run deploy.py on the model files with no error messages. Now when I run mo.optimize, I get the following output:

Python 2.7.12 (default, Dec  4 2017, 14:50:18) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import mo
>>> mo.optimize('model_algo_1',540,540)
DEBUG:mo:DLDT command: python3 /opt/awscam/intel/deeplearning_deploymenttoolkit/deployment_tools/model_optimizer/mo_mxnet.py --input_model /opt/awscam/artifacts/model_algo_1-0000.params --data_type FP16 --scale 1 --model_name model_algo_1 --output_dir /opt/awscam/artifacts --reverse_input_channels  --input_shape [1,3,540,540]
Model Optimizer arguments
        Batch:  1
        Precision of IR:        FP16
        Enable fusing:  True
        Enable gfusing:         True
        Names of input layers:  inherited from the model
        Path to the Input Model:        /opt/awscam/artifacts/model_algo_1-0000.params
        Input shapes:   [1,3,540,540]
        Log level:      ERROR
        Mean values:    ()
        IR output name:         model_algo_1
        Names of output layers:         inherited from the model
        Path for generated IR:  /opt/awscam/artifacts
        Reverse input channels:         True
        Scale factor:   1.0
        Scale values:   ()
        Version:        0.3.31.d8b314f6
        Prefix name for args.nd and argx.nd files: 
        Name of pretrained model which will be merged with .nd files: 
ERROR:mo:[ WARNING ]  
Detected not satisfied dependencies:
        mxnet: installed: 1.4.0, required: 1.0.0

Please install required versions of components or use install_prerequisites script
/opt/awscam/intel/deeplearning_deploymenttoolkit/deployment_tools/model_optimizer/install_prerequisites/install_prerequisites_mxnet.sh
Note that install_prerequisites scripts may install additional components.
/usr/local/lib/python3.5/dist-packages/mxnet/module/base_module.py:56: UserWarning: You created Module with Module(..., label_names=['softmax_label']) but input with name 'softmax_label' is not found in symbol.list_arguments(). Did you mean one of:
        data
  warnings.warn(msg)
[ ERROR ]  No or multiple placeholders in the model, but only one shape is provided, can not set it.
[ ERROR ]  The following error happened while processing input shapes: . For more information please refer to Model Optimizer FAQ.
BigEd
answered 5 years ago
0

Further progress. Identified mismatch between model hyperparams and deploy.py inputs. Trained new model with resnet50 and image size 512, deployed, and ran deploy.py and mo.optimize.

>>> mo.optimize('model_algo_1',512,512)
DEBUG:mo:DLDT command: python3 /opt/awscam/intel/deeplearning_deploymenttoolkit/deployment_tools/model_optimizer/mo_mxnet.py --input_model /opt/awscam/artifacts/model_algo_1-0000.params --data_type FP16 --scale 1 --model_name model_algo_1 --output_dir /opt/awscam/artifacts --reverse_input_channels  --input_shape [1,3,512,512]
Model Optimizer arguments
        Batch:  1
        Precision of IR:        FP16
        Enable fusing:  True
        Enable gfusing:         True
        Names of input layers:  inherited from the model
        Path to the Input Model:        /opt/awscam/artifacts/model_algo_1-0000.params
        Input shapes:   [1,3,512,512]
        Log level:      ERROR
        Mean values:    ()
        IR output name:         model_algo_1
        Names of output layers:         inherited from the model
        Path for generated IR:  /opt/awscam/artifacts
        Reverse input channels:         True
        Scale factor:   1.0
        Scale values:   ()
        Version:        0.3.31.d8b314f6
        Prefix name for args.nd and argx.nd files: 
        Name of pretrained model which will be merged with .nd files: 
ERROR:mo:[ WARNING ]  
Detected not satisfied dependencies:
        mxnet: installed: 1.4.0, required: 1.0.0

Please install required versions of components or use install_prerequisites script
/opt/awscam/intel/deeplearning_deploymenttoolkit/deployment_tools/model_optimizer/install_prerequisites/install_prerequisites_mxnet.sh
Note that install_prerequisites scripts may install additional components.
/usr/local/lib/python3.5/dist-packages/mxnet/module/base_module.py:56: UserWarning: You created Module with Module(..., label_names=['softmax_label']) but input with name 'softmax_label' is not found in symbol.list_arguments(). Did you mean one of:
        data
  warnings.warn(msg)
[ ERROR ]  Cannot infer shapes or values for node "cls_prob".
[ ERROR ]  There is no registered "infer" function for node "cls_prob" with op = "softmax". Please implement this function in the extensions. For more information please refer to Model Optimizer FAQ.
[ ERROR ]  
[ ERROR ]  It can happen due to bug in custom shape infer function <UNKNOWN>.
[ ERROR ]  Or because the node inputs have incorrect values/shapes.
[ ERROR ]  Or because input shapes are incorrect (embedded to the model or passed via --input_shape).
[ ERROR ]  Run Model Optimizer with --log_level=DEBUG for more information.
[ ERROR ]  Stopped shape/value propagation at "cls_prob" node. For more information please refer to Model Optimizer FAQ.

I've run the install_prerequisites script several times, but get the following error:

#details omitted
+ python3 pip3 install -r ../../../requirements_mxnet.txt
WARNING: The directory '/home/aws_cam/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
WARNING: The directory '/home/aws_cam/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
ERROR: Could not open requirements file: [Errno 2] No such file or directory: '../../../requirements_mxnet.txt'

Any ideas?

BigEd
answered 5 years ago
0

Figured there might be some issues with my model, so I went back and ran thru the SageMaker example notebook for Object Detection, then deployed it. Initially got the same error as above, but then I rolled back MXNET to 1.3.0 for both PIP3 and PIP. Then when I ran deploy.rb and mo.optimize, I got the following result:

>>> mo.optimize('model_algo_1',512,512)
DEBUG:mo:DLDT command: python3 /opt/awscam/intel/deeplearning_deploymenttoolkit/deployment_tools/model_optimizer/mo_mxnet.py --input_model /opt/awscam/artifacts/model_algo_1-0000.params --data_type FP16 --scale 1 --model_name model_algo_1 --output_dir /opt/awscam/artifacts --reverse_input_channels  --input_shape [1,3,512,512]
Model Optimizer arguments
        Batch:  1
        Precision of IR:        FP16
        Enable fusing:  True
        Enable gfusing:         True
        Names of input layers:  inherited from the model
        Path to the Input Model:        /opt/awscam/artifacts/model_algo_1-0000.params
        Input shapes:   [1,3,512,512]
        Log level:      ERROR
        Mean values:    ()
        IR output name:         model_algo_1
        Names of output layers:         inherited from the model
        Path for generated IR:  /opt/awscam/artifacts
        Reverse input channels:         True
        Scale factor:   1.0
        Scale values:   ()
        Version:        0.3.31.d8b314f6
        Prefix name for args.nd and argx.nd files: 
        Name of pretrained model which will be merged with .nd files: 
ERROR:mo:
(2, '')

According to the model optimizer documentation, a return of 2 for status means "Model optimization failed because you are using inconsistent platform versions." It recommends installing mxnet. Sigh.

BigEd
answered 5 years ago
0

I assume the "inconsistent platform versions" in mo.optimize status code 2 refers to the mxnet versions used to train and optimize the model. Is that correct?

As best as I can tell from Sagemaker documentation, the default mxnet version for training is 1.2.1.

I ran "sudo pip3 install mxnet==1.2.1" in deeplens, but still get error code 2.

Does anyone have more current or accurate info on mxnet versions in Sagemaker and DeepLens?

Is it possible the version conflict could be with deploy.py?

Edited by: BigEd on May 19, 2019 9:44 AM

BigEd
answered 5 years ago
0

Executed git reset command to set incubator-mxnet to the repo specified in the documentation. According to the README, that version of deploy.rb is compatible with mxnet 1.3.0 and earlier. Updated awscam version of mxnet to 1.3.0. Still getting return of status code 2 - inconsistent platforms from mo.optimize.

BigEd
answered 5 years ago
0

Believe I have resolved it. I updated awscam, rebooted, and installed mxnet version 1.4.0.

Documentation on how to use deploy.py is poor. For anyone else who is looking for the steps, do this:

  1. SSH to your DeepLens or open a terminal on it, clone the incubator-mxnet/example/ssd repository to the home directory, then run the following line to roll the version back to the right one:
git reset --hard 73d88974f8bca1e68441606fb0787a2cd17eb364
  1. Move files model_algo_1-0000.params, model_algo_1-symbol.json, and hyperparams.json from /opt/awscam/artifacts/ to /home/incubator-mxnet/example/ssd/.

  2. Run deploy.py as follows:

sudo python deploy.py --network='resnet50' --epoch=0 --prefix='model_algo_1' --data-shape=512 --num-class=80

where shape is the x & y dimension of the images you trained the model on, num-class is the number of classes you trained it to identify, and network should match the network you trained to (VGG16 or resnet50). If you are successful, you should get this:

Saved model: model_algo_1-0000.params
Saved symbol: model_algo_1-symbol.json
  1. Move the 3 model files back to /opt/awscam/artifacts/. You'll probably need to use sudo mv.

  2. Launch sudo python, import mo, and run the optimizer.

Edited by: BigEd on May 23, 2019 8:17 PM

Edited by: BigEd on May 23, 2019 8:17 PM

Edited by: BigEd on May 23, 2019 8:19 PM

BigEd
answered 5 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions