I cant save neuron model after compile the model into an AWS Neuron optimized TorchScript

0

I cant save neuron model after compile the model into an AWS Neuron optimized TorchScript.

My code:

import tensorflow  # to workaround a protobuf version conflict issue
import torch
import torch.neuron
import torch.nn.functional as F
import transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained('./data/tokenizer')
model = AutoModelForSequenceClassification.from_pretrained('data/model', return_dict=False, torchscript=True)

parsed_json_list = [
            {"premise": "I have a dog", "hypotheses": ["I love dogs", "I hate dogs"]}
        ]

model_inputs = tokenizer(
    [
        parsed_json["premise"]
        for parsed_json in parsed_json_list
        for _ in parsed_json["hypotheses"]
    ],
    [
        hypothesis
        for parsed_json in parsed_json_list
        for hypothesis in parsed_json["hypotheses"]
    ],
    return_tensors='pt',
    padding=True,
    truncation=True,
)

pred = model(**model_inputs)
example_inputs= model_inputs['input_ids'], model_inputs['attention_mask'], model_inputs['token_type_ids']

model_neuron = torch.neuron.trace(model, example_inputs, verbose=1)

Neuron log after torch.neuron.trace:

INFO:Neuron:Number of neuron graph operations 121 did not match traced graph 101 - using heuristic matching of hierarchical information
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 2131, compiled = 1992, percent compiled = 93.48%
INFO:Neuron:The neuron partitioner created 51 sub-graphs
INFO:Neuron:Neuron successfully compiled 50 sub-graphs, Total fused subgraphs = 51, Percent of model sub-graphs successfully compiled = 98.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 398
INFO:Neuron: => aten::ScalarImplicit: 2
INFO:Neuron: => aten::add: 82
INFO:Neuron: => aten::arange: 2
INFO:Neuron: => aten::bmm: 47
INFO:Neuron: => aten::clamp: 22
INFO:Neuron: => aten::contiguous: 71
INFO:Neuron: => aten::detach: 144
INFO:Neuron: => aten::div: 60
INFO:Neuron: => aten::expand: 22
INFO:Neuron: => aten::gelu: 13
INFO:Neuron: => aten::layer_norm: 26
INFO:Neuron: => aten::linear: 97
INFO:Neuron: => aten::mul: 38
INFO:Neuron: => aten::neg: 11
INFO:Neuron: => aten::permute: 71
INFO:Neuron: => aten::select: 1
INFO:Neuron: => aten::size: 460
INFO:Neuron: => aten::slice: 27
INFO:Neuron: => aten::sqrt: 36
INFO:Neuron: => aten::squeeze: 23
INFO:Neuron: => aten::sub: 1
INFO:Neuron: => aten::to: 96
INFO:Neuron: => aten::transpose: 47
INFO:Neuron: => aten::unsqueeze: 29
INFO:Neuron: => aten::view: 166
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 35 [supported]
INFO:Neuron: => aten::__and__: 1 [supported]
INFO:Neuron: => aten::abs: 1 [supported]
INFO:Neuron: => aten::add: 3 [supported]
INFO:Neuron: => aten::bmm: 1 [supported]
INFO:Neuron: => aten::ceil: 1 [supported]
INFO:Neuron: => aten::clamp: 2 [supported]
INFO:Neuron: => aten::contiguous: 1 [supported]
INFO:Neuron: => aten::detach: 2 [supported]
INFO:Neuron: => aten::div: 2 [supported]
INFO:Neuron: => aten::embedding: 1 [not supported]
INFO:Neuron: => aten::expand: 2 [supported]
INFO:Neuron: => aten::gather: 24 [not supported]
INFO:Neuron: => aten::gt: 1 [supported]
INFO:Neuron: => aten::le: 1 [supported]
INFO:Neuron: => aten::linear: 1 [supported]
INFO:Neuron: => aten::log: 2 [supported]
INFO:Neuron: => aten::lt: 1 [supported]
INFO:Neuron: => aten::mul: 2 [supported]
INFO:Neuron: => aten::neg: 1 [supported]
INFO:Neuron: => aten::permute: 1 [supported]
INFO:Neuron: => aten::repeat: 24 [not supported]
INFO:Neuron: => aten::sign: 1 [not supported]
INFO:Neuron: => aten::size: 10 [supported]
INFO:Neuron: => aten::slice: 2 [supported]
INFO:Neuron: => aten::squeeze: 2 [supported]
INFO:Neuron: => aten::to: 5 [supported]
INFO:Neuron: => aten::transpose: 1 [supported]
INFO:Neuron: => aten::type_as: 2 [supported]
INFO:Neuron: => aten::unsqueeze: 2 [supported]
INFO:Neuron: => aten::view: 2 [supported]
INFO:Neuron: => aten::where: 2 [not supported]
INFO:Neuron:skip_inference_context for tensorboard symbols at /home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/tensorboard.py:305 tb_parse
INFO:Neuron:Number of neuron graph operations 468 did not match traced graph 599 - using heuristic matching of hierarchical information
CPU times: user 1min 50s, sys: 13.1 s, total: 2min 3s
Wall time: 7min 37s

After i try save my model:

model_neuron.save('test.pt')

error log:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_17776/1444584018.py in <module>
----> 1 model_neuron.save('test.pt')

~/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch/jit/_script.py in save(self, f, **kwargs)
    691             See :func:`torch.jit.save <torch.jit.save>` for details.
    692             """
--> 693             return self._c.save(str(f), **kwargs)
    694 
    695         def _save_for_lite_interpreter(self, *args, **kwargs):

RuntimeError: 
Could not export Python function call 'XSoftmax'. Remove calls to Python functions before export. Did you forget to add @script or @script_method annotation? If this is a nn.ModuleList, add it to __constants__:
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/native_ops/prim.py(46): PythonOp
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/graph.py(330): __call__
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/graph.py(207): run_op
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/graph.py(196): __call__
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/runtime.py(69): forward
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py(1098): _slow_forward
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py(1110): _call_impl
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch/jit/_trace.py(965): trace_module
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch/jit/_trace.py(750): trace
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/tensorboard.py(307): tb_parse
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/tensorboard.py(533): tb_graph
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/decorators.py(482): maybe_generate_tb_graph_def
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py(513): maybe_determine_names_from_tensorboard
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py(200): trace
<timed exec>(1): <module>
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/magics/execution.py(1335): time
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/magic.py(187): <lambda>
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/decorator.py(232): fun
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(2473): run_cell_magic
/tmp/ipykernel_17776/2573155944.py(1): <module>
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3553): run_code
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3473): run_ast_nodes
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3258): run_cell_async
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/async_helpers.py(78): _pseudo_sync_runner
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3030): _run_cell
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(2976): run_cell
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/zmqshell.py(528): run_cell
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/ipkernel.py(387): do_execute
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(730): execute_request
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(406): dispatch_shell
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(499): process_one
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(510): dispatch_queue
/usr/lib64/python3.7/asyncio/events.py(88): _run
/usr/lib64/python3.7/asyncio/base_events.py(1786): _run_once
/usr/lib64/python3.7/asyncio/base_events.py(541): run_forever
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/tornado/platform/asyncio.py(215): start
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelapp.py(712): start
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/traitlets/config/application.py(982): launch_instance
/home/ec2-user/test_aws_inf_neuron/pytorch_venv/lib64/python3.7/site-packages/ipykernel_launcher.py(17): <module>
/usr/lib64/python3.7/runpy.py(85): _run_code
/usr/lib64/python3.7/runpy.py(193): _run_module_as_main

Thanks in advance.

1回答
1

We have not seen this exact serialization issue before, but this model is unlikely to perform well since it has unsupported operators (aten::gather and aten::repeat) evenly distributed throughout the model. When operators are unsupported, they are executed on CPU. For this model, this causes 51 Neuron subgraphs to be produced (with CPU operators in-between) which will most likely cause performance issues due to excessive data transfer between CPU/NeuronCores. We have previously seen an issue like this when using the DeBERTa model which uses gathers/repeats in each model layer.

We are looking into the possibility of improving the performance of these models, but currently this will not work well on Inferentia.

AWS
回答済み 2年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ