AWS Pytorch Neuron Compliation Error

1

I followed user guide on updating torch neuron and then started compiling the model to neuron. But got an error, from which I don't understand what's wrong. In Neuron SDK you claim that it should compile all operations, even not supported ones, they just should run on CPU.

The error:

INFO:Neuron:All operators are compiled by neuron-cc (this does not guarantee that neuron-cc will successfully compile)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 3345, fused = 3345, percent fused = 100.0%
INFO:Neuron:Number of neuron graph operations 8175 did not match traced graph 9652 - using heuristic matching of hierarchical information
INFO:Neuron:Compiling function _NeuronGraph$3362 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/ubuntu/alias/neuron/neuron_env/bin/neuron-cc compile /tmp/tmpmp8qvhtb/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpmp8qvhtb/graph_def.neff --io-config {"inputs": {"0:0": [[1, 3, 768, 768], "float32"]}, "outputs": ["aten_sigmoid/Sigmoid:0"]} --verbose 35'
..............................................................................INFO:Neuron:Compile command returned: -9
WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$3362; falling back to native python function call
ERROR:Neuron:neuron-cc failed with the following command line call:
/home/ubuntu/alias/neuron/neuron_env/bin/neuron-cc compile /tmp/tmpmp8qvhtb/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpmp8qvhtb/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 768, 768], "float32"]}, "outputs": ["aten_sigmoid/Sigmoid:0"]}' --verbose 35
Traceback (most recent call last):
  File "/home/ubuntu/alias/neuron/neuron_env/lib/python3.7/site-packages/torch_neuron/convert.py", line 382, in op_converter
    item, inputs, compiler_workdir=sg_workdir, **kwargs)
  File "/home/ubuntu/alias/neuron/neuron_env/lib/python3.7/site-packages/torch_neuron/decorators.py", line 220, in trace
    'neuron-cc failed with the following command line call:\n{}'.format(command))
subprocess.SubprocessError: neuron-cc failed with the following command line call:
/home/ubuntu/alias/neuron/neuron_env/bin/neuron-cc compile /tmp/tmpmp8qvhtb/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpmp8qvhtb/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 768, 768], "float32"]}, "outputs": ["aten_sigmoid/Sigmoid:0"]}' --verbose 35
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 3345, compiled = 0, percent compiled = 0.0%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 942 [supported]
INFO:Neuron: => aten::_convolution: 107 [supported]
INFO:Neuron: => aten::add: 104 [supported]
INFO:Neuron: => aten::batch_norm: 1 [supported]
INFO:Neuron: => aten::cat: 1 [supported]
INFO:Neuron: => aten::contiguous: 4 [supported]
INFO:Neuron: => aten::div: 104 [supported]
INFO:Neuron: => aten::dropout: 208 [supported]
INFO:Neuron: => aten::feature_dropout: 1 [supported]
INFO:Neuron: => aten::flatten: 60 [supported]
INFO:Neuron: => aten::gelu: 52 [supported]
INFO:Neuron: => aten::layer_norm: 161 [supported]
INFO:Neuron: => aten::linear: 264 [supported]
INFO:Neuron: => aten::matmul: 104 [supported]
INFO:Neuron: => aten::mul: 52 [supported]
INFO:Neuron: => aten::permute: 210 [supported]
INFO:Neuron: => aten::relu: 1 [supported]
INFO:Neuron: => aten::reshape: 262 [supported]
INFO:Neuron: => aten::select: 104 [supported]
INFO:Neuron: => aten::sigmoid: 1 [supported]
INFO:Neuron: => aten::size: 278 [supported]
INFO:Neuron: => aten::softmax: 52 [supported]
INFO:Neuron: => aten::transpose: 216 [supported]
INFO:Neuron: => aten::upsample_bilinear2d: 4 [supported]
INFO:Neuron: => aten::view: 52 [supported]
Traceback (most recent call last):
  File "to_neuron.py", line 14, in <module>
    model_neuron = torch.neuron.trace(model, example_inputs=[image.cuda()])
  File "/home/ubuntu/alias/neuron/neuron_env/lib/python3.7/site-packages/torch_neuron/convert.py", line 184, in trace
    cu.stats_post_compiler(neuron_graph)
  File "/home/ubuntu/alias/neuron/neuron_env/lib/python3.7/site-packages/torch_neuron/convert.py", line 493, in stats_post_compiler
    "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!")
RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace!
asked 2 years ago913 views
1 Answer
0

Hi, this error is due to out of memory of the host machine. The following code in your log indicates that:

Neuron:Compile command returned: -9

Try to compile this model on a bigger (more memory) instance. Not necessarily an inf1 instance. Can be a normal CPU based instance with lots of memory. If that doesn't address the issue, you can try to reduce the input shape of your tensor. You're passing a big tensor: 1, 3, 768, 768 try something like: 1, 3, 224, 224 and if it works, then you can run multiple compilations with (bigger) different tensor sizes up to the point you get an error in the compilation.

AWS
answered 2 years ago
  • My instance is g4dn.xlarge with 30GBs of memory, isn't that enough? Will try also with smaller input size, but it's very strange that 30GBs is not enough.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions