By using AWS re:Post, you agree to the Terms of Use

Not able to convert Hugging Face fine-tuned BERT model into AWS Neuron


Hi Team,

I have a fine-tuned BERT model which was trained using following libraries. torch == 1.8.1+cu111 transformers == 4.19.4

And not able to convert that fine-tuned BERT model into AWS neuron and getting following compilation errors. Could you please help me to resolve this issue?

Note: Trying to compile BERT model on SageMaker notebook instance and with "conda_python3" conda environment.


Set Pip repository to point to the Neuron repository

!pip config set global.extra-index-url

Install Neuron PyTorch - Note: Tried both options below.

"#!pip install torch-neuron==1.8.1.* neuron-cc[tensorflow] "protobuf<4" torchvision sagemaker>=2.79.0 transformers==4.17.0 --upgrade" !pip install --upgrade torch-neuron neuron-cc[tensorflow] "protobuf<4" torchvision

Model compilation:

import os
import tensorflow  # to workaround a protobuf version conflict issue
import torch
import torch.neuron
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_path = 'model/' # Model artifacts are stored in 'model/' directory

# load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path, torchscript=True)

# create dummy input for max length 128
dummy_input = "dummy input which will be padded later"
max_length = 128
embeddings = tokenizer(dummy_input, max_length=max_length, padding="max_length", truncation=True, return_tensors="pt")
neuron_inputs = tuple(embeddings.values())

# compile model with torch.neuron.trace and update config
model_neuron = torch.neuron.trace(model, neuron_inputs)
model.config.update({"traced_sequence_length": max_length})

# save tokenizer, neuron model and config for later use

Model artifacts: We have got this model artifacts from multi-label topic classification model.

config.json model.tar.gz pytorch_model.bin special_tokens_map.json tokenizer_config.json tokenizer.json

Error logs:

INFO:Neuron:There are 3 ops of 1 different types in the TorchScript that are not compiled by neuron-cc: aten::embedding, (For more information see
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 565, fused = 548, percent fused = 96.99%
INFO:Neuron:Number of neuron graph operations 1601 did not match traced graph 1323 - using heuristic matching of hierarchical information
WARNING:tensorflow:From /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch_neuron/ops/ where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
INFO:Neuron:Compiling function _NeuronGraph$698 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/ec2-user/anaconda3/envs/python3/bin/neuron-cc compile /tmp/tmpv4gg13ze/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpv4gg13ze/graph_def.neff --io-config {"inputs": {"0:0": [[1, 128, 768], "float32"], "1:0": [[1, 1, 1, 128], "float32"]}, "outputs": ["Linear_5/aten_linear/Add:0"]} --verbose 35'
INFO:Neuron:Compile command returned: -9
WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$698; falling back to native python function call
ERROR:Neuron:neuron-cc failed with the following command line call:
/home/ec2-user/anaconda3/envs/python3/bin/neuron-cc compile /tmp/tmpv4gg13ze/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpv4gg13ze/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 128, 768], "float32"], "1:0": [[1, 1, 1, 128], "float32"]}, "outputs": ["Linear_5/aten_linear/Add:0"]}' --verbose 35
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch_neuron/", line 382, in op_converter
    item, inputs, compiler_workdir=sg_workdir, **kwargs)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch_neuron/", line 220, in trace
    'neuron-cc failed with the following command line call:\n{}'.format(command))
subprocess.SubprocessError: neuron-cc failed with the following command line call:
/home/ec2-user/anaconda3/envs/python3/bin/neuron-cc compile /tmp/tmpv4gg13ze/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpv4gg13ze/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 128, 768], "float32"], "1:0": [[1, 1, 1, 128], "float32"]}, "outputs": ["Linear_5/aten_linear/Add:0"]}' --verbose 35
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 565, compiled = 0, percent compiled = 0.0%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 97 [supported]
INFO:Neuron: => aten::add: 39 [supported]
INFO:Neuron: => aten::contiguous: 12 [supported]
INFO:Neuron: => aten::div: 12 [supported]
INFO:Neuron: => aten::dropout: 38 [supported]
INFO:Neuron: => aten::embedding: 3 [not supported]
INFO:Neuron: => aten::gelu: 12 [supported]
INFO:Neuron: => aten::layer_norm: 25 [supported]
INFO:Neuron: => aten::linear: 74 [supported]
INFO:Neuron: => aten::matmul: 24 [supported]
INFO:Neuron: => aten::mul: 1 [supported]
INFO:Neuron: => aten::permute: 48 [supported]
INFO:Neuron: => aten::rsub: 1 [supported]
INFO:Neuron: => aten::select: 1 [supported]
INFO:Neuron: => aten::size: 97 [supported]
INFO:Neuron: => aten::slice: 5 [supported]
INFO:Neuron: => aten::softmax: 12 [supported]
INFO:Neuron: => aten::tanh: 1 [supported]
INFO:Neuron: => aten::to: 1 [supported]
INFO:Neuron: => aten::transpose: 12 [supported]
INFO:Neuron: => aten::unsqueeze: 2 [supported]
INFO:Neuron: => aten::view: 48 [supported]
RuntimeError                              Traceback (most recent call last)
<ipython-input-1-97bba321d013> in <module>
     19 # compile model with torch.neuron.trace and update config
---> 20 model_neuron = torch.neuron.trace(model, neuron_inputs)
     21 model.config.update({"traced_sequence_length": max_length})

~/anaconda3/envs/python3/lib/python3.6/site-packages/torch_neuron/ in trace(func, example_inputs, fallback, op_whitelist, minimum_segment_size, subgraph_builder_function, subgraph_inputs_pruning, skip_compiler, debug_must_trace, allow_no_ops_on_neuron, compiler_workdir, dynamic_batch_size, compiler_timeout, _neuron_trace, compiler_args, optimizations, verbose, **kwargs)
    182         logger.debug("skip_inference_context - trace with fallback at {}".format(get_file_and_line()))
    183         neuron_graph = cu.compile_fused_operators(neuron_graph, **compile_kwargs)
--> 184     cu.stats_post_compiler(neuron_graph)
    186     # Wrap the compiled version of the model in a script module. Note that this is

~/anaconda3/envs/python3/lib/python3.6/site-packages/torch_neuron/ in stats_post_compiler(self, neuron_graph)
    491         if succesful_compilations == 0 and not self.allow_no_ops_on_neuron:
    492             raise RuntimeError(
--> 493                 "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!")
    495         if percent_operations_compiled < 50.0:

RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace!

Thanks a lot.

1 Answers

The error message in your log shows Compile command returned: -9. This message typically indicates that the compiler process was killed. Normally this is due to the the OOM (out of memory) killer (run by the linux operating system) killing the compilation process due to memory exhaustion. The most recent version of torch-neuron should provide an updated message for -9 errors that reflects the typical cause for this failure mode.

We recommend you try compiling on an instance with more memory, such as an inf1.6xlarge. Note: you only need the larger instance for compilation; you can still use a smaller instance (such as an inf1.xlarge) to run inference.

Please let us know if compiling on a larger instance resolved the error you’re seeing.

answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions