Thank you for your interest in neuron. This response assumes that you followed the tutorial that you linked, compiled the neuron model in the underlying HuggingFace pipeline and saved it to disk. Then, in the code in the post, you are trying to reload the compiled model and run inference on the reloaded model.
If you followed the tutorial then these two lines of code were executed:
#compile the wrapped model and save it to disk model_wrapped_traced = tfn.trace(model_wrapped, example_inputs_list) model_wrapped_traced.save('./distilbert_b128')
This saves an already traced (compiled) model to the disk that accepts input in the form of a list (remember this is required to use the
neuron_model.save() function). Remember that HuggingFace tokenizers take string inputs and convert them into dictionaries.
In your code it looks like you reload the model with
reloaded_model = tf.keras.models.load_model('./distilbert_b128_2'). Note that if this model was already compiled and then saved (see the two lines of code above) then it does not need to be recompiled. It looks like you have attempted to compile another model with this line of code:
model_wrapped_traced = tfn.trace(model_wrapped, example_inputs_list). This isn’t necessary as you already have a compiled neuron model saved to your disk. One of the main benefits of saving neuron models is to avoid the time-consuming recompilation step.
Try something like this:
#First we define a wrapper class that wraps a model that accepts list inputs # (what we needed in order to originally save the model) # into a model that accepts dictionary inputs # (what the huggingface tokenizer outputs) class TFBertForSequenceClassificationDictIO(tf.keras.Model): def __init__(self, model_wrapped): super().__init__() self.model_wrapped = model_wrapped self.aws_neuron_function = model_wrapped.aws_neuron_function def call(self, inputs): input_ids = inputs['input_ids'] attention_mask = inputs['attention_mask'] logits = self.model_wrapped([input_ids, attention_mask]) return [logits] #Since we are in a new process we have to recreate our neuron pipeline # with it's special tokenizer that makes sure all the inputs have the same shape neuron_pipe = pipeline('sentiment-analysis', model=model_name, framework='tf') #the first step is to modify the underlying tokenizer to create a static #input shape as inferentia does not work with dynamic input shapes original_tokenizer = pipe.tokenizer #you intercept the function call to the original tokenizer #and inject our own code to modify the arguments def wrapper_function(*args, **kwargs): kwargs['padding'] = 'max_length' #this is the key line here to set a static input shape #so that all inputs are set to a len of 128 kwargs['max_length'] = 128 kwargs['truncation'] = True kwargs['return_tensors'] = 'tf' return original_tokenizer(*args, **kwargs) #insert our wrapper function as the new tokenizer as well #as reinserting back some attribute information that was lost #when you replaced the original tokenizer with our wrapper function neuron_pipe.tokenizer = wrapper_function neuron_pipe.tokenizer.decode = original_tokenizer.decode neuron_pipe.tokenizer.mask_token_id = original_tokenizer.mask_token_id neuron_pipe.tokenizer.pad_token_id = original_tokenizer.pad_token_id neuron_pipe.tokenizer.convert_ids_to_tokens = original_tokenizer.convert_ids_to_tokens #now wrap your neuron model that you reloaded with the wrapper class reloaded_model = tf.keras.models.load_model('./distilbert_b128_2') rewrapped_model = TFBertForSequenceClassificationDictIO(model_wrapped_traced) #now you can reinsert our reloaded model back into our pipeline neuron_pipe.model = rewrapped_model neuron_pipe.model.config = pipe.model.config #Run inference on your reloaded model (that you didn't have to recompile) #Our example data! string_inputs = [ 'I love to eat pizza!', 'I am sorry. I really want to like it, but I just can not stand sushi.', 'I really do not want to type out 128 strings to create batch 128 data.', 'Ah! Multiplying this list by 32 would be a great solution!', ] string_inputs = string_inputs * 32 outputs = neuron_pipe(string_inputs)
Not able to compile to NEFF, the BERT model from neuron tutorialasked 2 months ago
Not able to convert Hugging Face fine-tuned BERT model into AWS Neuronasked 5 months ago
neuron compiling a bert modelasked 25 days ago
AWS Pytorch Neuron Compliation Errorasked 2 months ago
Is it possible to compile a neuron model in my local machine?asked 2 months ago
What is a practical Inferentia limit to model size?asked 3 months ago
Neuron model loads when compiled for 1 core but fails to load when compiled for 4asked 12 days ago
neuron compiling bert model for inferentia on tf2
Issue with loading neuron model
I cant save neuron model after compile the model into an AWS Neuron optimized TorchScriptasked a month ago