I cant save neuron model after compile the model into an AWS Neuron optimized TorchScript

Question

I cant save neuron model after compile the model into an AWS Neuron optimized TorchScript.

My code:
```
import tensorflow  # to workaround a protobuf version conflict issue
import torch
import torch.neuron
import torch.nn.functional as F
import transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained('./data/tokenizer')
model = AutoModelForSequenceClassification.from_pretrained('data/model', return_dict=False, torchscript=True)

parsed_json_list = [
            {"premise": "I have a dog", "hypotheses": ["I love dogs", "I hate dogs"]}
        ]

model_inputs = tokenizer(
    [
        parsed_json["premise"]
        for parsed_json in parsed_json_list
        for _ in parsed_json["hypotheses"]
    ],
    [
        hypothesis
        for parsed_json in parsed_json_list
        for hypothesis in parsed_json["hypotheses"]
    ],
    return_tensors='pt',
    padding=True,
    truncation=True,
)

pred = model(**model_inputs)
example_inputs= model_inputs['input_ids'], model_inputs['attention_mask'], model_inputs['token_type_ids']

model_neuron = torch.neuron.trace(model, example_inputs, verbose=1)
```
Neuron log after torch.neuron.trace:

```
INFO:Neuron:Number of neuron graph operations 121 did not match traced graph 101 - using heuristic matching of hierarchical information
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 2131, compiled = 1992, percent compiled = 93.48%
INFO:Neuron:The neuron partitioner created 51 sub-graphs
INFO:Neuron:Neuron successfully compiled 50 sub-graphs, Total fused subgraphs = 51, Percent of model sub-graphs successfully compiled = 98.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 398
INFO:Neuron: => aten::ScalarImplicit: 2
INFO:Neuron: => aten::add: 82
INFO:Neuron: => aten::arange: 2
INFO:Neuron: => aten::bmm: 47
INFO:Neuron: => aten::clamp: 22
INFO:Neuron: => aten::contiguous: 71
INFO:Neuron: => aten::detach: 144
INFO:Neuron: => aten::div: 60
INFO:Neuron: => aten::expand: 22
INFO:Neuron: => aten::gelu: 13
INFO:Neuron: => aten::layer_norm: 26
INFO:Neuron: => aten::linear: 97
INFO:Neuron: => aten::mul: 38
INFO:Neuron: => aten::neg: 11
INFO:Neuron: => aten::permute: 71
INFO:Neuron: => aten::select: 1
INFO:Neuron: => aten::size: 460
INFO:Neuron: => aten::slice: 27
INFO:Neuron: => aten::sqrt: 36
INFO:Neuron: => aten::squeeze: 23
INFO:Neuron: => aten::sub: 1
INFO:Neuron: => aten::to: 96
INFO:Neuron: => aten::transpose: 47
INFO:Neuron: => aten::unsqueeze: 29
INFO:Neuron: => aten::view: 166
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 35 [supported]
INFO:Neuron: => aten::__and__: 1 [supported]
INFO:Neuron: => aten::abs: 1 [supported]
INFO:Neuron: => aten::add: 3 [supported]
INFO:Neuron: => aten::bmm: 1 [supported]
INFO:Neuron: => aten::ceil: 1 [supported]
INFO:Neuron: => aten::clamp: 2 [supported]
INFO:Neuron: => aten::contiguous: 1 [supported]
INFO:Neuron: => aten::detach: 2 [supported]
INFO:Neuron: => aten::div: 2 [supported]
INFO:Neuron: => aten::embedding: 1 [not supported]
INFO:Neuron: => aten::expand: 2 [supported]
INFO:Neuron: => aten::gather: 24 [not supported]
INFO:Neuron: => aten::gt: 1 [supported]
INFO:Neuron: => aten::le: 1 [supported]
INFO:Neuron: => aten::linear: 1 [supported]
INFO:Neuron: => aten::log: 2 [supported]
INFO:Neuron: => aten::lt: 1 [supported]
INFO:Neuron: => aten::mul: 2 [supported]
INFO:Neuron: => aten::neg: 1 [supported]
INFO:Neuron: => aten::permute: 1 [supported]
INFO:Neuron: => aten::repeat: 24 [not supported]
INFO:Neuron: => aten::sign: 1 [not supported]
INFO:Neuron: => aten::size: 10 [supported]
INFO:Neuron: => aten::slice: 2 [supported]
INFO:Neuron: => aten::squeeze: 2 [supported]
INFO:Neuron: => aten::to: 5 [supported]
INFO:Neuron: => aten::transpose: 1 [supported]
INFO:Neuron: => aten::type_as: 2 [supported]
INFO:Neuron: => aten::unsqueeze: 2 [supported]
INFO:Neuron: => aten::view: 2 [supported]
INFO:Neuron: => aten::where: 2 [not supported]
INFO:Neuron:skip_inference_context for tensorboard symbols at /home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/tensorboard.py:305 tb_parse
INFO:Neuron:Number of neuron graph operations 468 did not match traced graph 599 - using heuristic matching of hierarchical information
CPU times: user 1min 50s, sys: 13.1 s, total: 2min 3s
Wall time: 7min 37s
```
After i try save my model:
```
model_neuron.save('test.pt')
```
error log:
```
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_17776/1444584018.py in 
----> 1 model_neuron.save('test.pt')

~/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch/jit/_script.py in save(self, f, **kwargs)
    691             See :func:`torch.jit.save ` for details.
    692             """
--> 693             return self._c.save(str(f), **kwargs)
    694 
    695         def _save_for_lite_interpreter(self, *args, **kwargs):

RuntimeError: 
Could not export Python function call 'XSoftmax'. Remove calls to Python functions before export. Did you forget to add @script or @script_method annotation? If this is a nn.ModuleList, add it to __constants__:
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/native_ops/prim.py(46): PythonOp
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/graph.py(330): __call__
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/graph.py(207): run_op
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/graph.py(196): __call__
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/runtime.py(69): forward
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py(1098): _slow_forward
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py(1110): _call_impl
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch/jit/_trace.py(965): trace_module
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch/jit/_trace.py(750): trace
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/tensorboard.py(307): tb_parse
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/tensorboard.py(533): tb_graph
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/decorators.py(482): maybe_generate_tb_graph_def
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py(513): maybe_determine_names_from_tensorboard
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py(200): trace
(1): 
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/magics/execution.py(1335): time
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/magic.py(187): 
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/decorator.py(232): fun
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(2473): run_cell_magic
/tmp/ipykernel_17776/2573155944.py(1): 
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3553): run_code
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3473): run_ast_nodes
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3258): run_cell_async
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/async_helpers.py(78): _pseudo_sync_runner
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3030): _run_cell
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(2976): run_cell
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/zmqshell.py(528): run_cell
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/ipkernel.py(387): do_execute
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(730): execute_request
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(406): dispatch_shell
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(499): process_one
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(510): dispatch_queue
/usr/lib64/python3.7/asyncio/events.py(88): _run
/usr/lib64/python3.7/asyncio/base_events.py(1786): _run_once
/usr/lib64/python3.7/asyncio/base_events.py(541): run_forever
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/tornado/platform/asyncio.py(215): start
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelapp.py(712): start
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/traitlets/config/application.py(982): launch_instance
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel_launcher.py(17): 
/usr/lib64/python3.7/runpy.py(85): _run_code
/usr/lib64/python3.7/runpy.py(193): _run_module_as_main
```

Thanks in advance.

Answer

We have not seen this exact serialization issue before, but this model is unlikely to perform well since it has unsupported operators (aten::gather and aten::repeat) evenly distributed throughout the model. When operators are unsupported, they are executed on CPU. For this model, this causes 51 Neuron subgraphs to be produced (with CPU operators in-between) which will most likely cause performance issues due to excessive data transfer between CPU/NeuronCores. We have previously seen an issue like this when using the DeBERTa model which uses gathers/repeats in each model layer.

We are looking into the possibility of improving the performance of these models, but currently this will not work well on Inferentia.

I cant save neuron model after compile the model into an AWS Neuron optimized TorchScript

関連するコンテンツ