- 最新
- 投票最多
- 评论最多
Hi Ajay,
Thank you for your interest in neuron. This response assumes that you followed the tutorial that you linked, compiled the neuron model in the underlying HuggingFace pipeline and saved it to disk. Then, in the code in the post, you are trying to reload the compiled model and run inference on the reloaded model.
If you followed the tutorial then these two lines of code were executed:
#compile the wrapped model and save it to disk
model_wrapped_traced = tfn.trace(model_wrapped, example_inputs_list)
model_wrapped_traced.save('./distilbert_b128')
This saves an already traced (compiled) model to the disk that accepts input in the form of a list (remember this is required to use the neuron_model.save()
function). Remember that HuggingFace tokenizers take string inputs and convert them into dictionaries.
In your code it looks like you reload the model with reloaded_model = tf.keras.models.load_model('./distilbert_b128_2')
. Note that if this model was already compiled and then saved (see the two lines of code above) then it does not need to be recompiled. It looks like you have attempted to compile another model with this line of code: model_wrapped_traced = tfn.trace(model_wrapped, example_inputs_list)
. This isn’t necessary as you already have a compiled neuron model saved to your disk. One of the main benefits of saving neuron models is to avoid the time-consuming recompilation step.
Try something like this:
#First we define a wrapper class that wraps a model that accepts list inputs
# (what we needed in order to originally save the model)
# into a model that accepts dictionary inputs
# (what the huggingface tokenizer outputs)
class TFBertForSequenceClassificationDictIO(tf.keras.Model):
def __init__(self, model_wrapped):
super().__init__()
self.model_wrapped = model_wrapped
self.aws_neuron_function = model_wrapped.aws_neuron_function
def call(self, inputs):
input_ids = inputs['input_ids']
attention_mask = inputs['attention_mask']
logits = self.model_wrapped([input_ids, attention_mask])
return [logits]
#Since we are in a new process we have to recreate our neuron pipeline
# with it's special tokenizer that makes sure all the inputs have the same shape
neuron_pipe = pipeline('sentiment-analysis', model=model_name, framework='tf')
#the first step is to modify the underlying tokenizer to create a static
#input shape as inferentia does not work with dynamic input shapes
original_tokenizer = pipe.tokenizer
#you intercept the function call to the original tokenizer
#and inject our own code to modify the arguments
def wrapper_function(*args, **kwargs):
kwargs['padding'] = 'max_length'
#this is the key line here to set a static input shape
#so that all inputs are set to a len of 128
kwargs['max_length'] = 128
kwargs['truncation'] = True
kwargs['return_tensors'] = 'tf'
return original_tokenizer(*args, **kwargs)
#insert our wrapper function as the new tokenizer as well
#as reinserting back some attribute information that was lost
#when you replaced the original tokenizer with our wrapper function
neuron_pipe.tokenizer = wrapper_function
neuron_pipe.tokenizer.decode = original_tokenizer.decode
neuron_pipe.tokenizer.mask_token_id = original_tokenizer.mask_token_id
neuron_pipe.tokenizer.pad_token_id = original_tokenizer.pad_token_id
neuron_pipe.tokenizer.convert_ids_to_tokens = original_tokenizer.convert_ids_to_tokens
#now wrap your neuron model that you reloaded with the wrapper class
reloaded_model = tf.keras.models.load_model('./distilbert_b128_2')
rewrapped_model = TFBertForSequenceClassificationDictIO(model_wrapped_traced)
#now you can reinsert our reloaded model back into our pipeline
neuron_pipe.model = rewrapped_model
neuron_pipe.model.config = pipe.model.config
#Run inference on your reloaded model (that you didn't have to recompile)
#Our example data!
string_inputs = [
'I love to eat pizza!',
'I am sorry. I really want to like it, but I just can not stand sushi.',
'I really do not want to type out 128 strings to create batch 128 data.',
'Ah! Multiplying this list by 32 would be a great solution!',
]
string_inputs = string_inputs * 32
outputs = neuron_pipe(string_inputs)
Hi Ajay, if you want the complete code, I would recommend following the tutorial exactly. Please see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/tensorflow/tensorflow-neuron/tutorials/tensorflow-tutorial-setup.html#tens[…]etup for how to setup the tutorial. Huggingface is not required to use neuron, but it does offer lots of useful features. The bert-large-uncased model https://huggingface.co/bert-large-uncased is a Google BERT wrapped with Huggingface features. If you cannot use Huggingface we may be able to assist with a different solution.
相关内容
- AWS 官方已更新 2 年前
Couldnt get it to work. After filling in the gaps I got this: StagingError: Exception encountered when calling layer "tf_bert_for_sequence_classification_flat_io" (type TFBertForSequenceClassificationFlatIO).
Could you please share the complete code with me. I think as I added the code for the gaps in your answer, I might have made more mistakes.
Also I need neuron compilation of a Google bert large model with sequence length 512. Can I do this without referring to HF code?