inferentia neuron core usage is only 1 when 4 cores are available

0

Im using the following code to load a neuron compiled model for inference. However on my inf1.2xlarge instance, neuron-top shows for cores (NC0 to NC3). Only NC0 gets used in inference. How do I increase throughput by using all cores???

from transformers import BertTokenizer, BertModel
import torch
import torch_neuron
import os.path
import os

os.environ['NEURON_RT_NUM_CORES']=str(4)
fname = 'modelneuron.pt'
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
inputs = tokenizer("Hello, my dog is cute and big", return_tensors="pt")
if not os.path.isfile(fname):

    model = BertModel.from_pretrained('bert-base-uncased', return_dict=False)

    neuron_model = torch_neuron.trace(model,
                                    example_inputs = (inputs['input_ids'],inputs['attention_mask']))
    neuron_model.save("modelneuron.pt")
    print('saved neuron model')
else:
    neuron_model = torch.jit.load('modelneuron.pt')
    print('loaded neuron model')

for i in range(10000):
    outputs = neuron_model(*(inputs['input_ids'],inputs['attention_mask']))

print(outputs)
preguntada hace un año284 visualizaciones
1 Respuesta
1

Hi,

For running inference in parallel using Neuron on a inf1 instance to utilize all available NeuronCores we can use torch.neuron.DataParallel.

torch.neuron.DataParallel implements data parallelism at the module level by duplicating the Neuron model on all available NeuronCores and distributing data across the different cores for parallelized inference.

You can read more about running Inference using torch.neuron.DataParallel here: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuron/api-torch-neuron-dataparallel-api.html#torch-neuron-dataparallel-api

In addition, here is an example of using DataParallel https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/resnet50.html#Run-Inference-using-torch.neuron.DataParallel

AWS
Chris_T
respondido hace un año

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas