By using AWS re:Post, you agree to the Terms of Use

Machine Learning & AI

AWS offers the broadest and deepest set of machine learning services and supporting cloud infrastructure, putting machine learning in the hands of every developer, data scientist and expert practitioner. AWS is helping more than one hundred thousand customers accelerate their machine learning journey.

Recent questions

see all
1/18

Not able to compile to NEFF, the BERT model from neuron tutorial

Hi Team, I wanted to compile a BERT model and run it on inferentia. I trained my model using pytorch and tried to convert it by following the same steps in this [tutorial](https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert.ipynb) on my amazon linux Machine. But I keep getting failure with this error: ``` 09/22/2022 06:13:56 PM ERROR 23737 [neuron-cc]: Failed to parse model /tmp/tmp64l9ygmj/graph_def.pb: The following operators are not implemented: {'SelectV2'} (NotImplementedError) ``` I followed the installation steps [here](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-intro/pytorch-setup/pytorch-install.html) for pytorch-1.11.0 and tried to execute the code in tutorial but got the same error. We wanted to explore using Inferentia for our large BERT model but are blocked on doing so due to failure in conversion to NEFF format. I also tried following steps using TF and ran into some other ops unsupported issue. Could you please help! Below are the setup commands i ran on my Amazon Linux Desktop ``` sudo yum install -y python3.7-venv gcc-c++ python3.7 -m venv pytorch_venv source pytorch_venv/bin/activate pip install -U pip # Set Pip repository to point to the Neuron repository pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com #Install Neuron PyTorch pip install torch-neuron neuron-cc[tensorflow] "protobuf<4" torchvision !pip install --upgrade "transformers==4.6.0" pip install tensorflow==2.8.1 ``` and then executed the below script(copied from tutorial) on my amazon linux host: ``` import tensorflow # to workaround a protobuf version conflict issue import torch import torch.neuron from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig import transformers import os import warnings # Setting up NeuronCore groups for inf1.6xlarge with 16 cores num_cores = 16 # This value should be 4 on inf1.xlarge and inf1.2xlarge nc_env = ','.join(['1'] * num_cores) warnings.warn("NEURONCORE_GROUP_SIZES is being deprecated, if your application is using NEURONCORE_GROUP_SIZES please \ see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/deprecation.html#announcing-end-of-support-for-neuroncore-group-sizes \ for more details.", DeprecationWarning) os.environ['NEURONCORE_GROUP_SIZES'] = nc_env # Build tokenizer and model tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc") model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc", return_dict=False) # Setup some example inputs sequence_0 = "The company HuggingFace is based in New York City" sequence_1 = "Apples are especially bad for your health" sequence_2 = "HuggingFace's headquarters are situated in Manhattan" max_length=128 paraphrase = tokenizer.encode_plus(sequence_0, sequence_2, max_length=max_length, padding='max_length', truncation=True, return_tensors="pt") not_paraphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=max_length, padding='max_length', truncation=True, return_tensors="pt") # Run the original PyTorch model on compilation exaple paraphrase_classification_logits = model(**paraphrase)[0] # Convert example inputs to a format that is compatible with TorchScript tracing example_inputs_paraphrase = paraphrase['input_ids'], paraphrase['attention_mask'], paraphrase['token_type_ids'] example_inputs_not_paraphrase = not_paraphrase['input_ids'], not_paraphrase['attention_mask'], not_paraphrase['token_type_ids'] # Run torch.neuron.trace to generate a TorchScript that is optimized by AWS Neuron model_neuron = torch.neuron.trace(model, example_inputs_paraphrase) ``` This gave me the following error: ``` 2022-09-22 18:13:12.145617: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory 2022-09-22 18:13:12.145649: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. sample_pytorch_model.py:14: DeprecationWarning: NEURONCORE_GROUP_SIZES is being deprecated, if your application is using NEURONCORE_GROUP_SIZES please see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/deprecation.html#announcing-end-of-support-for-neuroncore-group-sizes for more details. for more details.", DeprecationWarning) Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 433/433 [00:00<00:00, 641kB/s] Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████| 213k/213k [00:00<00:00, 636kB/s] Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████| 436k/436k [00:00<00:00, 731kB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████| 29.0/29.0 [00:00<00:00, 35.2kB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████| 433M/433M [00:09<00:00, 45.7MB/s] /local/home/spareek/pytorch_venv/lib64/python3.7/site-packages/transformers/modeling_utils.py:1968: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! input_tensor.shape[chunk_dim] == tensor_shape for input_tensor in input_tensors INFO:Neuron:There are 3 ops of 1 different types in the TorchScript that are not compiled by neuron-cc: aten::embedding, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-pytorch.md) INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 565, fused = 548, percent fused = 96.99% INFO:Neuron:Number of neuron graph operations 1601 did not match traced graph 1323 - using heuristic matching of hierarchical information INFO:Neuron:Compiling function _NeuronGraph$662 with neuron-cc INFO:Neuron:Compiling with command line: '/local/home/spareek/pytorch_venv/bin/neuron-cc compile /tmp/tmp64l9ygmj/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp64l9ygmj/graph_def.neff --io-config {"inputs": {"0:0": [[1, 128, 768], "float32"], "1:0": [[1, 1, 1, 128], "float32"]}, "outputs": ["Linear_5/aten_linear/Add:0"]} --verbose 35' huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) .2022-09-22 18:13:52.697717: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory 2022-09-22 18:13:52.697749: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 09/22/2022 06:13:56 PM ERROR 23737 [neuron-cc]: Failed to parse model /tmp/tmp64l9ygmj/graph_def.pb: The following operators are not implemented: {'SelectV2'} (NotImplementedError) Compiler status ERROR INFO:Neuron:Compile command returned: 1 WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$662; falling back to native python function call ERROR:Neuron:neuron-cc failed with the following command line call: /local/home/spareek/pytorch_venv/bin/neuron-cc compile /tmp/tmp64l9ygmj/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp64l9ygmj/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 128, 768], "float32"], "1:0": [[1, 1, 1, 128], "float32"]}, "outputs": ["Linear_5/aten_linear/Add:0"]}' --verbose 35 Traceback (most recent call last): File "/local/home/spareek/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py", line 382, in op_converter item, inputs, compiler_workdir=sg_workdir, **kwargs) File "/local/home/spareek/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/decorators.py", line 220, in trace 'neuron-cc failed with the following command line call:\n{}'.format(command)) subprocess.SubprocessError: neuron-cc failed with the following command line call: /local/home/spareek/pytorch_venv/bin/neuron-cc compile /tmp/tmp64l9ygmj/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp64l9ygmj/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 128, 768], "float32"], "1:0": [[1, 1, 1, 128], "float32"]}, "outputs": ["Linear_5/aten_linear/Add:0"]}' --verbose 35 INFO:Neuron:Number of arithmetic operators (post-compilation) before = 565, compiled = 0, percent compiled = 0.0% INFO:Neuron:The neuron partitioner created 1 sub-graphs INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0% INFO:Neuron:Compiled these operators (and operator counts) to Neuron: INFO:Neuron:Not compiled operators (and operator counts) to Neuron: INFO:Neuron: => aten::Int: 97 [supported] INFO:Neuron: => aten::add: 39 [supported] INFO:Neuron: => aten::contiguous: 12 [supported] INFO:Neuron: => aten::div: 12 [supported] INFO:Neuron: => aten::dropout: 38 [supported] INFO:Neuron: => aten::embedding: 3 [not supported] INFO:Neuron: => aten::gelu: 12 [supported] INFO:Neuron: => aten::layer_norm: 25 [supported] INFO:Neuron: => aten::linear: 74 [supported] INFO:Neuron: => aten::matmul: 24 [supported] INFO:Neuron: => aten::mul: 1 [supported] INFO:Neuron: => aten::permute: 48 [supported] INFO:Neuron: => aten::rsub: 1 [supported] INFO:Neuron: => aten::select: 1 [supported] INFO:Neuron: => aten::size: 97 [supported] INFO:Neuron: => aten::slice: 5 [supported] INFO:Neuron: => aten::softmax: 12 [supported] INFO:Neuron: => aten::tanh: 1 [supported] INFO:Neuron: => aten::to: 1 [supported] INFO:Neuron: => aten::transpose: 12 [supported] INFO:Neuron: => aten::unsqueeze: 2 [supported] INFO:Neuron: => aten::view: 48 [supported] Traceback (most recent call last): File "sample_pytorch_model.py", line 38, in <module> model_neuron = torch.neuron.trace(model, example_inputs_paraphrase) File "/local/home/spareek/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py", line 184, in trace cu.stats_post_compiler(neuron_graph) File "/local/home/spareek/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py", line 493, in stats_post_compiler "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!") RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace! ```
1
answers
0
votes
10
views
asked 3 days ago

Recent articles

see all
1/3

Popular users

see all
1/18

Learn AWS faster by following popular topics

1/1