AWS Pytorch Neuron Compliation Error

1

I followed user guide on updating torch neuron and then started compiling the model to neuron. But got an error, from which I don't understand what's wrong. In Neuron SDK you claim that it should compile all operations, even not supported ones, they just should run on CPU.

The error:

INFO:Neuron:All operators are compiled by neuron-cc (this does not guarantee that neuron-cc will successfully compile)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 3345, fused = 3345, percent fused = 100.0%
INFO:Neuron:Number of neuron graph operations 8175 did not match traced graph 9652 - using heuristic matching of hierarchical information
INFO:Neuron:Compiling function _NeuronGraph$3362 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/ubuntu/alias/neuron/neuron_env/bin/neuron-cc compile /tmp/tmpmp8qvhtb/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpmp8qvhtb/graph_def.neff --io-config {"inputs": {"0:0": [[1, 3, 768, 768], "float32"]}, "outputs": ["aten_sigmoid/Sigmoid:0"]} --verbose 35'
..............................................................................INFO:Neuron:Compile command returned: -9
WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$3362; falling back to native python function call
ERROR:Neuron:neuron-cc failed with the following command line call:
/home/ubuntu/alias/neuron/neuron_env/bin/neuron-cc compile /tmp/tmpmp8qvhtb/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpmp8qvhtb/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 768, 768], "float32"]}, "outputs": ["aten_sigmoid/Sigmoid:0"]}' --verbose 35
Traceback (most recent call last):
  File "/home/ubuntu/alias/neuron/neuron_env/lib/python3.7/site-packages/torch_neuron/convert.py", line 382, in op_converter
    item, inputs, compiler_workdir=sg_workdir, **kwargs)
  File "/home/ubuntu/alias/neuron/neuron_env/lib/python3.7/site-packages/torch_neuron/decorators.py", line 220, in trace
    'neuron-cc failed with the following command line call:\n{}'.format(command))
subprocess.SubprocessError: neuron-cc failed with the following command line call:
/home/ubuntu/alias/neuron/neuron_env/bin/neuron-cc compile /tmp/tmpmp8qvhtb/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpmp8qvhtb/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 768, 768], "float32"]}, "outputs": ["aten_sigmoid/Sigmoid:0"]}' --verbose 35
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 3345, compiled = 0, percent compiled = 0.0%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 942 [supported]
INFO:Neuron: => aten::_convolution: 107 [supported]
INFO:Neuron: => aten::add: 104 [supported]
INFO:Neuron: => aten::batch_norm: 1 [supported]
INFO:Neuron: => aten::cat: 1 [supported]
INFO:Neuron: => aten::contiguous: 4 [supported]
INFO:Neuron: => aten::div: 104 [supported]
INFO:Neuron: => aten::dropout: 208 [supported]
INFO:Neuron: => aten::feature_dropout: 1 [supported]
INFO:Neuron: => aten::flatten: 60 [supported]
INFO:Neuron: => aten::gelu: 52 [supported]
INFO:Neuron: => aten::layer_norm: 161 [supported]
INFO:Neuron: => aten::linear: 264 [supported]
INFO:Neuron: => aten::matmul: 104 [supported]
INFO:Neuron: => aten::mul: 52 [supported]
INFO:Neuron: => aten::permute: 210 [supported]
INFO:Neuron: => aten::relu: 1 [supported]
INFO:Neuron: => aten::reshape: 262 [supported]
INFO:Neuron: => aten::select: 104 [supported]
INFO:Neuron: => aten::sigmoid: 1 [supported]
INFO:Neuron: => aten::size: 278 [supported]
INFO:Neuron: => aten::softmax: 52 [supported]
INFO:Neuron: => aten::transpose: 216 [supported]
INFO:Neuron: => aten::upsample_bilinear2d: 4 [supported]
INFO:Neuron: => aten::view: 52 [supported]
Traceback (most recent call last):
  File "to_neuron.py", line 14, in <module>
    model_neuron = torch.neuron.trace(model, example_inputs=[image.cuda()])
  File "/home/ubuntu/alias/neuron/neuron_env/lib/python3.7/site-packages/torch_neuron/convert.py", line 184, in trace
    cu.stats_post_compiler(neuron_graph)
  File "/home/ubuntu/alias/neuron/neuron_env/lib/python3.7/site-packages/torch_neuron/convert.py", line 493, in stats_post_compiler
    "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!")
RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace!
質問済み 2年前759ビュー
1回答
0

Hi, this error is due to out of memory of the host machine. The following code in your log indicates that:

Neuron:Compile command returned: -9

Try to compile this model on a bigger (more memory) instance. Not necessarily an inf1 instance. Can be a normal CPU based instance with lots of memory. If that doesn't address the issue, you can try to reduce the input shape of your tensor. You're passing a big tensor: 1, 3, 768, 768 try something like: 1, 3, 224, 224 and if it works, then you can run multiple compilations with (bigger) different tensor sizes up to the point you get an error in the compilation.

AWS
回答済み 2年前
  • My instance is g4dn.xlarge with 30GBs of memory, isn't that enough? Will try also with smaller input size, but it's very strange that 30GBs is not enough.

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ