Questions tagged with AWS Inferentia

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

  • 1
  • 2
  • 12 / page

Neuron model loads when compiled for 1 core but fails to load when compiled for 4

Hello, We are testing the pipeline mode for neuron/inferentia, but can not get a model running for multi-core. The single core compiled model loads fine and is able to run inference on inferentia without issue. However, after compiling a model for multi-core using `compiler-args=['--neuroncore-pipeline-cores', '4']` (which takes ~16hrs on a r6a.16xl) the model errors out while loading into memory on the inferentia box. Here's the error message: ``` 2022-Nov-22 22:29:25.0728 20764:22801 ERROR TDRV:dmem_alloc Failed to alloc DEVICE memory: 589824 2022-Nov-22 22:29:25.0728 20764:22801 ERROR TDRV:copy_and_stage_mr_one_channel Failed to allocate aligned (0) buffer in MLA DRAM for W10-t of size 589824 bytes, channel 0 2022-Nov-22 22:29:25.0728 20764:22801 ERROR TDRV:kbl_model_add copy_and_stage_mr() error 2022-Nov-22 22:29:26.0091 20764:22799 ERROR TDRV:dmem_alloc Failed to alloc DEVICE memory: 16777216 2022-Nov-22 22:29:26.0091 20764:22799 ERROR TDRV:dma_ring_alloc Failed to allocate RX ring 2022-Nov-22 22:29:26.0091 20764:22799 ERROR TDRV:drs_create_data_refill_rings Failed to allocate pring for data refill dma 2022-Nov-22 22:29:26.0091 20764:22799 ERROR TDRV:kbl_model_add create_data_refill_rings() error 2022-Nov-22 22:29:26.0116 20764:20764 ERROR TDRV:remove_model Unknown model: 1001 2022-Nov-22 22:29:26.0116 20764:20764 ERROR TDRV:kbl_model_remove Failed to find and remove model: 1001 2022-Nov-22 22:29:26.0117 20764:20764 ERROR TDRV:remove_model Unknown model: 1001 2022-Nov-22 22:29:26.0117 20764:20764 ERROR TDRV:kbl_model_remove Failed to find and remove model: 1001 2022-Nov-22 22:29:26.0117 20764:20764 ERROR NMGR:dlr_kelf_stage Failed to load subgraph 2022-Nov-22 22:29:26.0354 20764:20764 ERROR NMGR:stage_kelf_models Failed to stage graph: kelf-a.json to NeuronCore 2022-Nov-22 22:29:26.0364 20764:20764 ERROR NMGR:kmgr_load_nn_post_metrics Failed to load NN: 1.11.7.0+aec18907e-/tmp/tmpab7oth00, err: 4 Traceback (most recent call last): File "infer_test.py", line 34, in <module> model_neuron = torch.jit.load('model-4c.pt') File "/root/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/jit_load_wrapper.py", line 13, in wrapper script_module = jit_load(*args, **kwargs) File "/root/pytorch_venv/lib64/python3.7/site-packages/torch/jit/_serialization.py", line 162, in load cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files) RuntimeError: Could not load the model status=4 message=Allocation Failure ``` Any help would be appreciated.
1
answers
0
votes
32
views
asked 16 days ago

neuron compiling a bert model

Hi, I want to neuron compile a bert large model(patentbert from google) which has sequence length 512. How do I do this? Also I want to call the model as before or need to know what I should change while calling it. I compiled the anferico/bert-for-patents using the instructions here https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/tensorflow/huggingface_bert/huggingface_bert.html. Also is it the same to use patentbert from google and huggingface anferico/bert-for-patents ?? The saved model cli output for 1. Google's patentbert MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs: signature_def['serving_default']: The given SavedModel SignatureDef contains the following input(s): inputs['input_ids'] tensor_info: dtype: DT_INT64 shape: (-1, 512) name: input_ids:0 inputs['input_mask'] tensor_info: dtype: DT_INT64 shape: (-1, 512) name: input_mask:0 inputs['mlm_positions'] tensor_info: dtype: DT_INT64 shape: (-1, 45) name: mlm_positions:0 inputs['segment_ids'] tensor_info: dtype: DT_INT64 shape: (-1, 512) name: segment_ids:0 The given SavedModel SignatureDef contains the following output(s): outputs['cls_token'] tensor_info: dtype: DT_FLOAT shape: (-1, 1024) name: Squeeze:0 outputs['encoder_layer'] tensor_info: dtype: DT_FLOAT shape: (-1, 512, 1024) name: bert/encoder/layer_23/output/LayerNorm/batchnorm/add_1:0 outputs['mlm_logits'] tensor_info: dtype: DT_FLOAT shape: (-1, 39859) name: cls/predictions/BiasAdd:0 outputs['next_sentence_logits'] tensor_info: dtype: DT_FLOAT shape: (-1, 2) name: cls/seq_relationship/BiasAdd:0 Method name is: tensorflow/serving/predict MetaGraphDef with tag-set: 'serve, tpu' contains the following SignatureDefs: signature_def['serving_default']: The given SavedModel SignatureDef contains the following input(s): inputs['input_ids'] tensor_info: dtype: DT_INT64 shape: (-1, 512) name: input_ids:0 inputs['input_mask'] tensor_info: dtype: DT_INT64 shape: (-1, 512) name: input_mask:0 inputs['mlm_positions'] tensor_info: dtype: DT_INT64 shape: (-1, 45) name: mlm_positions:0 inputs['segment_ids'] tensor_info: dtype: DT_INT64 shape: (-1, 512) name: segment_ids:0 The given SavedModel SignatureDef contains the following output(s): outputs['cls_token'] tensor_info: dtype: DT_FLOAT shape: unknown_rank name: TPUPartitionedCall:5 outputs['encoder_layer'] tensor_info: dtype: DT_FLOAT shape: unknown_rank name: TPUPartitionedCall:6 outputs['mlm_logits'] tensor_info: dtype: DT_FLOAT shape: unknown_rank name: TPUPartitionedCall:4 outputs['next_sentence_logits'] tensor_info: dtype: DT_FLOAT shape: unknown_rank name: TPUPartitionedCall:7 Method name is: tensorflow/serving/predict 2. Neuron compiled anferico/bert-for-patents MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs: signature_def['__saved_model_init_op']: The given SavedModel SignatureDef contains the following input(s): The given SavedModel SignatureDef contains the following output(s): outputs['__saved_model_init_op'] tensor_info: dtype: DT_INVALID shape: unknown_rank name: NoOp Method name is: signature_def['serving_default']: The given SavedModel SignatureDef contains the following input(s): inputs['input_1'] tensor_info: dtype: DT_INT32 shape: (-1, 512) name: serving_default_input_1:0 inputs['input_2'] tensor_info: dtype: DT_INT32 shape: (-1, 512) name: serving_default_input_2:0 The given SavedModel SignatureDef contains the following output(s): outputs['output_1'] tensor_info: dtype: DT_FLOAT shape: (-1, 2) name: StatefulPartitionedCall:0 Method name is: tensorflow/serving/predict Concrete Functions: Function Name: '__call__' Option #1 Callable with: Argument #1 DType: list Value: [TensorSpec(shape=(None, 512), dtype=tf.int32, name='input_1'), TensorSpec(shape=(None, 512), dtype=tf.int32, name='input_2')] Function Name: '_default_save_signature' Option #1 Callable with: Argument #1 DType: list Value: [TensorSpec(shape=(None, 512), dtype=tf.int32, name='input_1'), TensorSpec(shape=(None, 512), dtype=tf.int32, name='input_2')] Function Name: 'aws_neuron_function' Option #1 Callable with: Argument #1 args_0 Argument #2 args_0_1 Function Name: 'call_and_return_all_conditional_losses' Option #1 Callable with: Argument #1 DType: list Value: [TensorSpec(shape=(None, 512), dtype=tf.int32, name='input_1'), TensorSpec(shape=(None, 512), dtype=tf.int32, name='input_2')] This is the stack trace when I replace the original bert model with neuron compiled model. Traceback (most recent call last): File "/home/ubuntu/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1645, in _call_with_flat_signature args.append(kwargs.pop(compat.as_str(keyword))) KeyError: 'input_1' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1615, in _call_impl cancellation_manager) File "/home/ubuntu/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1651, in _call_with_flat_signature raise TypeError(f"{self._flat_signature_summary()} missing required " TypeError: signature_wrapper(input_1, input_2) missing required arguments: input_1, input_2. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "main.py", line 319, in <module> verbose) File "main.py", line 223, in process_each_project verbose=verbose) File "/home/ubuntu/ranking_pipeline/rank_utils.py", line 437, in rank response, inputs, _ = self.model.predict(search_sentences) File "/home/ubuntu/ranking_pipeline/bert_utils.py", line 292, in predict inputs['mlm_ids'], dtype=tf.int64), File "/home/ubuntu/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1601, in __call__ return self._call_impl(args, kwargs) File "/home/ubuntu/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1617, in _call_impl raise structured_err File "/home/ubuntu/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1611, in _call_impl cancellation_manager) File "/home/ubuntu/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1688, in _call_with_structured_signature self._structured_signature_check_missing_args(args, kwargs) File "/home/ubuntu/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1707, in _structured_signature_check_missing_args raise TypeError(f"{self._structured_signature_summary()} missing " TypeError: signature_wrapper(*, input_2, input_1) missing required arguments: input_1, input_2. Thanks in advance Warm regards Ajay
1
answers
0
votes
82
views
asked a month ago

Issue with loading neuron model

I am trying to load a neuron compiled model generated as given in https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/tensorflow/huggingface_bert/huggingface_bert.html . I am still a newbie so please excuse my mistakes. This is my code for loading a neuron compiled model.It is almost entirely based on the code in the page referred earlier. from transformers import pipeline import tensorflow as tf import tensorflow.neuron as tfn class TFBertForSequenceClassificationDictIO(tf.keras.Model): def __init__(self, model_wrapped): super().__init__() self.model_wrapped = model_wrapped self.aws_neuron_function = model_wrapped.aws_neuron_function def call(self, inputs): input_ids = inputs['input_ids'] attention_mask = inputs['attention_mask'] logits = self.model_wrapped([input_ids, attention_mask]) return [logits] class TFBertForSequenceClassificationFlatIO(tf.keras.Model): def __init__(self, model): super().__init__() self.model = model def call(self, inputs): input_ids, attention_mask = inputs output = self.model({'input_ids': input_ids, 'attention_mask': attention_mask}) return output['logits'] string_inputs = [ 'I love to eat pizza!', 'I am sorry. I really want to like it, but I just can not stand sushi.', 'I really do not want to type out 128 strings to create batch 128 data.', 'Ah! Multiplying this list by 32 would be a great solution!', ] string_inputs = string_inputs * 32 model_name = 'distilbert-base-uncased-finetuned-sst-2-english' neuron_pipe = pipeline('sentiment-analysis', model=model_name, framework='tf') example_inputs = neuron_pipe.tokenizer(string_inputs) pipe = pipeline('sentiment-analysis', model=model_name, framework='tf') reloaded_model = tf.keras.models.load_model('./distilbert_b128_2') model_wrapped = TFBertForSequenceClassificationFlatIO(pipe.model) example_inputs_list = [example_inputs['input_ids'], example_inputs['attention_mask']] model_wrapped_traced = tfn.trace(model_wrapped, example_inputs_list) rewrapped_model = TFBertForSequenceClassificationDictIO(model_wrapped_traced) This is the stacktrace 2022-11-05 02:46:55.553817: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them. All model checkpoint layers were used when initializing TFDistilBertForSequenceClassification. All the layers of TFDistilBertForSequenceClassification were initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english. If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training. Some layers from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing TFDistilBertForSequenceClassification: ['dropout_19'] - This IS expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some layers of TFDistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english and are newly initialized: ['dropout_39'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually. Traceback (most recent call last): File "inferencesmall.py", line 40, in <module> model_wrapped_traced = tfn.trace(model_wrapped, example_inputs_list) File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/tensorflow_neuron/python/_trace.py", line 167, in trace func = func.get_concrete_function(*example_inputs) File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 1264, in get_concrete_function concrete = self._get_concrete_function_garbage_collected(*args, **kwargs) File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 1244, in _get_concrete_function_garbage_collected self._initialize(args, kwargs, add_initializers_to=initializers) File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 786, in _initialize *args, **kwds)) File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2983, in _get_concrete_function_internal_garbage_collected graph_function, _ = self._maybe_define_function(args, kwargs) File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3292, in _maybe_define_function graph_function = self._create_graph_function(args, kwargs) File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3140, in _create_graph_function capture_by_value=self._capture_by_value), File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 1161, in func_graph_from_py_func func_outputs = python_func(*func_args, **func_kwargs) File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 677, in wrapped_fn out = weak_wrapped_fn().__wrapped__(*args, **kwds) File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 1147, in autograph_handler raise e.ag_error_metadata.to_exception(e) tensorflow.python.autograph.impl.api.StagingError: in user code: File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/keras/engine/base_layer.py", line 987, in error_handler * return fn(*args, **kwargs) StagingError: Exception encountered when calling layer "tf_bert_for_sequence_classification_flat_io" (type TFBertForSequenceClassificationFlatIO). in user code: File "inferencesmall.py", line 22, in call * output = self.model({'input_ids': input_ids, 'attention_mask': attention_mask}) File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler ** raise e.with_traceback(filtered_tb) from None StagingError: Exception encountered when calling layer "tf_distil_bert_for_sequence_classification_1" (type TFDistilBertForSequenceClassification). in user code: File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/transformers/models/distilbert/modeling_tf_distilbert.py", line 798, in call * distilbert_output = self.distilbert( File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler ** raise e.with_traceback(filtered_tb) from None StagingError: Exception encountered when calling layer "distilbert" (type TFDistilBertMainLayer). in user code: File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/transformers/models/distilbert/modeling_tf_distilbert.py", line 423, in call * inputs = input_processing( File "/home/ubuntu/huggingface/venv/lib/python3.7/site-packages/transformers/modeling_tf_utils.py", line 372, in input_processing * output[parameter_names[i]] = input IndexError: list index out of range Thanks in advance Ajay
1
answers
0
votes
74
views
asked a month ago

neuron compiling bert model for inferentia on tf2

Hi, This link https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/tensorflow/tensorflow-neuron/tutorials/bert_demo/bert_demo.html mentions how to compile using tensorflow 1. Can anyone let me know the steps to neuron compile a BERT large model for running inference on inferentia using tensorflow v2?? Thanks in advance Ajay P.S This is what my log looks like while compiling on tf1 INFO:tensorflow:fusing subgraph {subgraph neuron_op_e76ab3d9bc74f09f with input tensors ["<tf.Tensor 'bert/encoder/ones0/_0:0' shape=(1, 512, 1) dtype=float32>", "<tf.Tensor 'bert/encoder/Cast0/_1:0' shape=(1, 1, 512) dtype=float32>", "<tf.Tensor 'bert/embeddings/LayerNorm/batchnorm/add_10/_2:0' shape=(1, 512, 1024) dtype=float32>"], output tensors ["<tf.Tensor 'bert/pooler/dense/Tanh:0' shape=(1, 1024) dtype=float32>", "<tf.Tensor 'bert/encoder/layer_23/output/LayerNorm/batchnorm/add_1:0' shape=(1, 512, 1024) dtype=float32>"]} with neuron-cc . Compiler status ERROR WARNING:tensorflow:11/03/2022 04:28:48 AM ERROR 9932 [neuron-cc]: Failed to parse model /tmp/tmpbyvnmr6h/neuron_op_e76ab3d9bc74f09f/graph_def.pb: The following operators are not implemented: {'Einsum'} (NotImplementedError) INFO:tensorflow:Number of operations in TensorFlow session: 7427 INFO:tensorflow:Number of operations after tf.neuron optimizations: 2901 INFO:tensorflow:Number of operations placed on Neuron runtime: 0 WARNING:tensorflow:Converted /home/ubuntu/bert_repo/patent_model/ to ./bert-saved-model-neuron_tf1.15 but no operator will be running on AWS machine learning accelerators. This is probably not what you want. Please refer to https://github.com/aws/aws-neuron-sdk for current limitations of the AWS Neuron SDK. We are actively improving (and hiring)! {'OnNeuronRatio': 0.0} ---I assume the OnNeuronRatio being 0 means that I wont be able to make use of Inferentia hardware acceleration. Is that correct?
1
answers
0
votes
29
views
asked a month ago

AWS Pytorch Neuron Compliation Error

I followed user guide on updating torch neuron and then started compiling the model to neuron. But got an error, from which I don't understand what's wrong. In Neuron SDK you claim that it should compile all operations, even not supported ones, they just should run on CPU. The error: ``` INFO:Neuron:All operators are compiled by neuron-cc (this does not guarantee that neuron-cc will successfully compile) INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 3345, fused = 3345, percent fused = 100.0% INFO:Neuron:Number of neuron graph operations 8175 did not match traced graph 9652 - using heuristic matching of hierarchical information INFO:Neuron:Compiling function _NeuronGraph$3362 with neuron-cc INFO:Neuron:Compiling with command line: '/home/ubuntu/alias/neuron/neuron_env/bin/neuron-cc compile /tmp/tmpmp8qvhtb/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpmp8qvhtb/graph_def.neff --io-config {"inputs": {"0:0": [[1, 3, 768, 768], "float32"]}, "outputs": ["aten_sigmoid/Sigmoid:0"]} --verbose 35' ..............................................................................INFO:Neuron:Compile command returned: -9 WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$3362; falling back to native python function call ERROR:Neuron:neuron-cc failed with the following command line call: /home/ubuntu/alias/neuron/neuron_env/bin/neuron-cc compile /tmp/tmpmp8qvhtb/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpmp8qvhtb/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 768, 768], "float32"]}, "outputs": ["aten_sigmoid/Sigmoid:0"]}' --verbose 35 Traceback (most recent call last): File "/home/ubuntu/alias/neuron/neuron_env/lib/python3.7/site-packages/torch_neuron/convert.py", line 382, in op_converter item, inputs, compiler_workdir=sg_workdir, **kwargs) File "/home/ubuntu/alias/neuron/neuron_env/lib/python3.7/site-packages/torch_neuron/decorators.py", line 220, in trace 'neuron-cc failed with the following command line call:\n{}'.format(command)) subprocess.SubprocessError: neuron-cc failed with the following command line call: /home/ubuntu/alias/neuron/neuron_env/bin/neuron-cc compile /tmp/tmpmp8qvhtb/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpmp8qvhtb/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 768, 768], "float32"]}, "outputs": ["aten_sigmoid/Sigmoid:0"]}' --verbose 35 INFO:Neuron:Number of arithmetic operators (post-compilation) before = 3345, compiled = 0, percent compiled = 0.0% INFO:Neuron:The neuron partitioner created 1 sub-graphs INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0% INFO:Neuron:Compiled these operators (and operator counts) to Neuron: INFO:Neuron:Not compiled operators (and operator counts) to Neuron: INFO:Neuron: => aten::Int: 942 [supported] INFO:Neuron: => aten::_convolution: 107 [supported] INFO:Neuron: => aten::add: 104 [supported] INFO:Neuron: => aten::batch_norm: 1 [supported] INFO:Neuron: => aten::cat: 1 [supported] INFO:Neuron: => aten::contiguous: 4 [supported] INFO:Neuron: => aten::div: 104 [supported] INFO:Neuron: => aten::dropout: 208 [supported] INFO:Neuron: => aten::feature_dropout: 1 [supported] INFO:Neuron: => aten::flatten: 60 [supported] INFO:Neuron: => aten::gelu: 52 [supported] INFO:Neuron: => aten::layer_norm: 161 [supported] INFO:Neuron: => aten::linear: 264 [supported] INFO:Neuron: => aten::matmul: 104 [supported] INFO:Neuron: => aten::mul: 52 [supported] INFO:Neuron: => aten::permute: 210 [supported] INFO:Neuron: => aten::relu: 1 [supported] INFO:Neuron: => aten::reshape: 262 [supported] INFO:Neuron: => aten::select: 104 [supported] INFO:Neuron: => aten::sigmoid: 1 [supported] INFO:Neuron: => aten::size: 278 [supported] INFO:Neuron: => aten::softmax: 52 [supported] INFO:Neuron: => aten::transpose: 216 [supported] INFO:Neuron: => aten::upsample_bilinear2d: 4 [supported] INFO:Neuron: => aten::view: 52 [supported] Traceback (most recent call last): File "to_neuron.py", line 14, in <module> model_neuron = torch.neuron.trace(model, example_inputs=[image.cuda()]) File "/home/ubuntu/alias/neuron/neuron_env/lib/python3.7/site-packages/torch_neuron/convert.py", line 184, in trace cu.stats_post_compiler(neuron_graph) File "/home/ubuntu/alias/neuron/neuron_env/lib/python3.7/site-packages/torch_neuron/convert.py", line 493, in stats_post_compiler "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!") RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace! ```
1
answers
1
votes
84
views
asked 2 months ago

Is it possible to compile a neuron model in my local machine?

I'm following some guides and from my understanding this should be possible. But I've been trying for hours to compile a yolov5 model into a neuron model with no success. Is it even possible to do this in my local machine or do I have to be in an inferentia instance? This is what my environment looks like: ``` # packages in environment at /miniconda3/envs/neuron: # # Name Version Build Channel _libgcc_mutex 0.1 main _openmp_mutex 5.1 1_gnu absl-py 1.2.0 pypi_0 pypi astor 0.8.1 pypi_0 pypi attrs 22.1.0 pypi_0 pypi backcall 0.2.0 pyhd3eb1b0_0 ca-certificates 2022.07.19 h06a4308_0 cachetools 5.2.0 pypi_0 pypi certifi 2022.9.24 pypi_0 pypi charset-normalizer 2.1.1 pypi_0 pypi cycler 0.11.0 pypi_0 pypi debugpy 1.5.1 py37h295c915_0 decorator 5.1.1 pyhd3eb1b0_0 dmlc-nnvm 1.11.1.0+0 pypi_0 pypi dmlc-topi 1.11.1.0+0 pypi_0 pypi dmlc-tvm 1.11.1.0+0 pypi_0 pypi entrypoints 0.4 py37h06a4308_0 fonttools 4.37.3 pypi_0 pypi gast 0.2.2 pypi_0 pypi google-auth 2.12.0 pypi_0 pypi google-auth-oauthlib 0.4.6 pypi_0 pypi google-pasta 0.2.0 pypi_0 pypi gputil 1.4.0 pypi_0 pypi grpcio 1.49.1 pypi_0 pypi h5py 3.7.0 pypi_0 pypi idna 3.4 pypi_0 pypi importlib-metadata 4.12.0 pypi_0 pypi inferentia-hwm 1.11.0.0+0 pypi_0 pypi iniconfig 1.1.1 pypi_0 pypi ipykernel 6.15.2 py37h06a4308_0 ipython 7.34.0 pypi_0 pypi ipywidgets 8.0.2 pypi_0 pypi islpy 2021.1+aws2021.x.16.0.bld0 pypi_0 pypi jedi 0.18.1 py37h06a4308_1 jupyter_client 7.3.5 py37h06a4308_0 jupyter_core 4.10.0 py37h06a4308_0 jupyterlab-widgets 3.0.3 pypi_0 pypi keras-applications 1.0.8 pypi_0 pypi keras-preprocessing 1.1.2 pypi_0 pypi kiwisolver 1.4.4 pypi_0 pypi ld_impl_linux-64 2.38 h1181459_1 libffi 3.3 he6710b0_2 libgcc-ng 11.2.0 h1234567_1 libgomp 11.2.0 h1234567_1 libsodium 1.0.18 h7b6447c_0 libstdcxx-ng 11.2.0 h1234567_1 llvmlite 0.39.1 pypi_0 pypi markdown 3.4.1 pypi_0 pypi markupsafe 2.1.1 pypi_0 pypi matplotlib 3.5.3 pypi_0 pypi matplotlib-inline 0.1.6 py37h06a4308_0 ncurses 6.3 h5eee18b_3 nest-asyncio 1.5.5 py37h06a4308_0 networkx 2.4 pypi_0 pypi neuron-cc 1.11.7.0+aec18907e pypi_0 pypi numba 0.56.2 pypi_0 pypi numpy 1.19.5 pypi_0 pypi oauthlib 3.2.1 pypi_0 pypi opencv-python 4.6.0.66 pypi_0 pypi openssl 1.1.1q h7f8727e_0 opt-einsum 3.3.0 pypi_0 pypi packaging 21.3 pyhd3eb1b0_0 pandas 1.3.5 pypi_0 pypi parso 0.8.3 pyhd3eb1b0_0 pexpect 4.8.0 pyhd3eb1b0_3 pickleshare 0.7.5 pyhd3eb1b0_1003 pillow 9.2.0 pypi_0 pypi pip 22.2.2 pypi_0 pypi pluggy 1.0.0 pypi_0 pypi prompt-toolkit 3.0.31 pypi_0 pypi protobuf 3.20.3 pypi_0 pypi psutil 5.9.2 pypi_0 pypi ptyprocess 0.7.0 pyhd3eb1b0_2 py 1.11.0 pypi_0 pypi pyasn1 0.4.8 pypi_0 pypi pyasn1-modules 0.2.8 pypi_0 pypi pygments 2.13.0 pypi_0 pypi pyparsing 3.0.9 py37h06a4308_0 pytest 7.1.3 pypi_0 pypi python 3.7.13 h12debd9_0 python-dateutil 2.8.2 pyhd3eb1b0_0 pytz 2022.2.1 pypi_0 pypi pyyaml 6.0 pypi_0 pypi pyzmq 23.2.0 py37h6a678d5_0 readline 8.1.2 h7f8727e_1 requests 2.28.1 pypi_0 pypi requests-oauthlib 1.3.1 pypi_0 pypi rsa 4.9 pypi_0 pypi scipy 1.4.1 pypi_0 pypi seaborn 0.12.0 pypi_0 pypi setuptools 59.8.0 pypi_0 pypi six 1.16.0 pyhd3eb1b0_1 sqlite 3.39.3 h5082296_0 tensorboard 1.15.0 pypi_0 pypi tensorboard-data-server 0.6.1 pypi_0 pypi tensorboard-plugin-wit 1.8.1 pypi_0 pypi tensorflow 1.15.0 pypi_0 pypi tensorflow-estimator 1.15.1 pypi_0 pypi termcolor 2.0.1 pypi_0 pypi thop 0.1.1-2209072238 pypi_0 pypi tk 8.6.12 h1ccaba5_0 tomli 2.0.1 pypi_0 pypi torch 1.11.0 pypi_0 pypi torch-neuron 1.11.0.2.3.0.0 pypi_0 pypi torchvision 0.12.0 pypi_0 pypi tornado 6.2 py37h5eee18b_0 tqdm 4.64.1 pypi_0 pypi traitlets 5.4.0 pypi_0 pypi typing-extensions 4.3.0 pypi_0 pypi urllib3 1.26.12 pypi_0 pypi wcwidth 0.2.5 pyhd3eb1b0_0 werkzeug 2.2.2 pypi_0 pypi wheel 0.37.1 pypi_0 pypi widgetsnbextension 4.0.3 pypi_0 pypi wrapt 1.14.1 pypi_0 pypi xz 5.2.6 h5eee18b_0 zeromq 4.3.4 h2531618_0 zipp 3.8.1 pypi_0 pypi zlib 1.2.12 h5eee18b_3 ```
1
answers
2
votes
57
views
asked 2 months ago

Not able to compile to NEFF, the BERT model from neuron tutorial

Hi Team, I wanted to compile a BERT model and run it on inferentia. I trained my model using pytorch and tried to convert it by following the same steps in this [tutorial](https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert.ipynb) on my amazon linux Machine. But I keep getting failure with this error: ``` 09/22/2022 06:13:56 PM ERROR 23737 [neuron-cc]: Failed to parse model /tmp/tmp64l9ygmj/graph_def.pb: The following operators are not implemented: {'SelectV2'} (NotImplementedError) ``` I followed the installation steps [here](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-intro/pytorch-setup/pytorch-install.html) for pytorch-1.11.0 and tried to execute the code in tutorial but got the same error. We wanted to explore using Inferentia for our large BERT model but are blocked on doing so due to failure in conversion to NEFF format. I also tried following steps using TF and ran into some other ops unsupported issue. Could you please help! Below are the setup commands i ran on my Amazon Linux Desktop ``` sudo yum install -y python3.7-venv gcc-c++ python3.7 -m venv pytorch_venv source pytorch_venv/bin/activate pip install -U pip # Set Pip repository to point to the Neuron repository pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com #Install Neuron PyTorch pip install torch-neuron neuron-cc[tensorflow] "protobuf<4" torchvision !pip install --upgrade "transformers==4.6.0" pip install tensorflow==2.8.1 ``` and then executed the below script(copied from tutorial) on my amazon linux host: ``` import tensorflow # to workaround a protobuf version conflict issue import torch import torch.neuron from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig import transformers import os import warnings # Setting up NeuronCore groups for inf1.6xlarge with 16 cores num_cores = 16 # This value should be 4 on inf1.xlarge and inf1.2xlarge nc_env = ','.join(['1'] * num_cores) warnings.warn("NEURONCORE_GROUP_SIZES is being deprecated, if your application is using NEURONCORE_GROUP_SIZES please \ see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/deprecation.html#announcing-end-of-support-for-neuroncore-group-sizes \ for more details.", DeprecationWarning) os.environ['NEURONCORE_GROUP_SIZES'] = nc_env # Build tokenizer and model tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc") model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc", return_dict=False) # Setup some example inputs sequence_0 = "The company HuggingFace is based in New York City" sequence_1 = "Apples are especially bad for your health" sequence_2 = "HuggingFace's headquarters are situated in Manhattan" max_length=128 paraphrase = tokenizer.encode_plus(sequence_0, sequence_2, max_length=max_length, padding='max_length', truncation=True, return_tensors="pt") not_paraphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=max_length, padding='max_length', truncation=True, return_tensors="pt") # Run the original PyTorch model on compilation exaple paraphrase_classification_logits = model(**paraphrase)[0] # Convert example inputs to a format that is compatible with TorchScript tracing example_inputs_paraphrase = paraphrase['input_ids'], paraphrase['attention_mask'], paraphrase['token_type_ids'] example_inputs_not_paraphrase = not_paraphrase['input_ids'], not_paraphrase['attention_mask'], not_paraphrase['token_type_ids'] # Run torch.neuron.trace to generate a TorchScript that is optimized by AWS Neuron model_neuron = torch.neuron.trace(model, example_inputs_paraphrase) ``` This gave me the following error: ``` 2022-09-22 18:13:12.145617: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory 2022-09-22 18:13:12.145649: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. sample_pytorch_model.py:14: DeprecationWarning: NEURONCORE_GROUP_SIZES is being deprecated, if your application is using NEURONCORE_GROUP_SIZES please see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/deprecation.html#announcing-end-of-support-for-neuroncore-group-sizes for more details. for more details.", DeprecationWarning) Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 433/433 [00:00<00:00, 641kB/s] Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████| 213k/213k [00:00<00:00, 636kB/s] Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████| 436k/436k [00:00<00:00, 731kB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████| 29.0/29.0 [00:00<00:00, 35.2kB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████| 433M/433M [00:09<00:00, 45.7MB/s] /local/home/spareek/pytorch_venv/lib64/python3.7/site-packages/transformers/modeling_utils.py:1968: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! input_tensor.shape[chunk_dim] == tensor_shape for input_tensor in input_tensors INFO:Neuron:There are 3 ops of 1 different types in the TorchScript that are not compiled by neuron-cc: aten::embedding, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-pytorch.md) INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 565, fused = 548, percent fused = 96.99% INFO:Neuron:Number of neuron graph operations 1601 did not match traced graph 1323 - using heuristic matching of hierarchical information INFO:Neuron:Compiling function _NeuronGraph$662 with neuron-cc INFO:Neuron:Compiling with command line: '/local/home/spareek/pytorch_venv/bin/neuron-cc compile /tmp/tmp64l9ygmj/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp64l9ygmj/graph_def.neff --io-config {"inputs": {"0:0": [[1, 128, 768], "float32"], "1:0": [[1, 1, 1, 128], "float32"]}, "outputs": ["Linear_5/aten_linear/Add:0"]} --verbose 35' huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) .2022-09-22 18:13:52.697717: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory 2022-09-22 18:13:52.697749: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 09/22/2022 06:13:56 PM ERROR 23737 [neuron-cc]: Failed to parse model /tmp/tmp64l9ygmj/graph_def.pb: The following operators are not implemented: {'SelectV2'} (NotImplementedError) Compiler status ERROR INFO:Neuron:Compile command returned: 1 WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$662; falling back to native python function call ERROR:Neuron:neuron-cc failed with the following command line call: /local/home/spareek/pytorch_venv/bin/neuron-cc compile /tmp/tmp64l9ygmj/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp64l9ygmj/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 128, 768], "float32"], "1:0": [[1, 1, 1, 128], "float32"]}, "outputs": ["Linear_5/aten_linear/Add:0"]}' --verbose 35 Traceback (most recent call last): File "/local/home/spareek/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py", line 382, in op_converter item, inputs, compiler_workdir=sg_workdir, **kwargs) File "/local/home/spareek/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/decorators.py", line 220, in trace 'neuron-cc failed with the following command line call:\n{}'.format(command)) subprocess.SubprocessError: neuron-cc failed with the following command line call: /local/home/spareek/pytorch_venv/bin/neuron-cc compile /tmp/tmp64l9ygmj/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp64l9ygmj/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 128, 768], "float32"], "1:0": [[1, 1, 1, 128], "float32"]}, "outputs": ["Linear_5/aten_linear/Add:0"]}' --verbose 35 INFO:Neuron:Number of arithmetic operators (post-compilation) before = 565, compiled = 0, percent compiled = 0.0% INFO:Neuron:The neuron partitioner created 1 sub-graphs INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0% INFO:Neuron:Compiled these operators (and operator counts) to Neuron: INFO:Neuron:Not compiled operators (and operator counts) to Neuron: INFO:Neuron: => aten::Int: 97 [supported] INFO:Neuron: => aten::add: 39 [supported] INFO:Neuron: => aten::contiguous: 12 [supported] INFO:Neuron: => aten::div: 12 [supported] INFO:Neuron: => aten::dropout: 38 [supported] INFO:Neuron: => aten::embedding: 3 [not supported] INFO:Neuron: => aten::gelu: 12 [supported] INFO:Neuron: => aten::layer_norm: 25 [supported] INFO:Neuron: => aten::linear: 74 [supported] INFO:Neuron: => aten::matmul: 24 [supported] INFO:Neuron: => aten::mul: 1 [supported] INFO:Neuron: => aten::permute: 48 [supported] INFO:Neuron: => aten::rsub: 1 [supported] INFO:Neuron: => aten::select: 1 [supported] INFO:Neuron: => aten::size: 97 [supported] INFO:Neuron: => aten::slice: 5 [supported] INFO:Neuron: => aten::softmax: 12 [supported] INFO:Neuron: => aten::tanh: 1 [supported] INFO:Neuron: => aten::to: 1 [supported] INFO:Neuron: => aten::transpose: 12 [supported] INFO:Neuron: => aten::unsqueeze: 2 [supported] INFO:Neuron: => aten::view: 48 [supported] Traceback (most recent call last): File "sample_pytorch_model.py", line 38, in <module> model_neuron = torch.neuron.trace(model, example_inputs_paraphrase) File "/local/home/spareek/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py", line 184, in trace cu.stats_post_compiler(neuron_graph) File "/local/home/spareek/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py", line 493, in stats_post_compiler "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!") RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace! ```
1
answers
0
votes
60
views
asked 3 months ago

What is a practical Inferentia limit to model size?

I am trying to test a model compiled for Inferentia on an `inf1.2xlarge`, but when loading the model I receive the following error messages: ``` 2022-Sep-15 22:10:01.0152 3802:3802 ERROR TDRV:dmem_alloc Failed to alloc DEVICE memory: 1073741824 2022-Sep-15 22:10:01.0152 3802:3802 ERROR TDRV:dma_ring_alloc Failed to allocate TX ring 2022-Sep-15 22:10:01.0172 3802:3802 ERROR TDRV:io_create_rings Failed to allocate io ring for queue qPoolOut0_0 2022-Sep-15 22:10:01.0172 3802:3802 ERROR TDRV:kbl_model_add create_io_rings() error 2022-Sep-15 22:10:01.0182 3802:3802 ERROR NMGR:dlr_kelf_stage Failed to load subgraph 2022-Sep-15 22:10:01.0182 3802:3802 ERROR NMGR:stage_kelf_models Failed to stage graph: kelf-a.json to NeuronCore 2022-Sep-15 22:10:01.0184 3802:3802 ERROR NMGR:kmgr_load_nn_post_metrics Failed to load NN: 1.11.7.0+aec18907e-/tmp/tmptkxl8amn, err: 4 ``` These are wrapped into a Python runtime exception: ``` RuntimeError: Could not load the model status=4 message=Allocation Failure ``` I presume that this is because the model is on the large size. The `.neff` file is 373MB and takes ~4 hours to compile for a batch size of 1. This particular model is compiled for a single Neuron core. I am now trying to compile with `--neuroncore-pipeline-cores 4` to spread the model across multiple cores. This however gives me the following log message: ``` INFO: The requested number of neuroncore-pipeline-cores (4) may not be suitable for this network, and may lead to sub-optimal performance. Recommended neuroncore-pipeline-cores for this network is 1. ``` (I can't find any technical details on how much memory an Inferentia chip has, although I'm guessing that due to Inferentia architecture "memory" is not used in the same way as it might be on CPU or GPU.) So, what is a practical size limit for an Inferentia model and what can I do about running this model on Inf1?
1
answers
0
votes
101
views
ntw-au
asked 3 months ago
1
answers
1
votes
207
views
asked 7 months ago
  • 1
  • 2
  • 12 / page