inferentia neuron core usage is only 1 when 4 cores are available

0

Im using the following code to load a neuron compiled model for inference. However on my inf1.2xlarge instance, neuron-top shows for cores (NC0 to NC3). Only NC0 gets used in inference. How do I increase throughput by using all cores???

from transformers import BertTokenizer, BertModel
import torch
import torch_neuron
import os.path
import os

os.environ['NEURON_RT_NUM_CORES']=str(4)
fname = 'modelneuron.pt'
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
inputs = tokenizer("Hello, my dog is cute and big", return_tensors="pt")
if not os.path.isfile(fname):

    model = BertModel.from_pretrained('bert-base-uncased', return_dict=False)

    neuron_model = torch_neuron.trace(model,
                                    example_inputs = (inputs['input_ids'],inputs['attention_mask']))
    neuron_model.save("modelneuron.pt")
    print('saved neuron model')
else:
    neuron_model = torch.jit.load('modelneuron.pt')
    print('loaded neuron model')

for i in range(10000):
    outputs = neuron_model(*(inputs['input_ids'],inputs['attention_mask']))

print(outputs)
질문됨 일 년 전285회 조회
1개 답변
1

Hi,

For running inference in parallel using Neuron on a inf1 instance to utilize all available NeuronCores we can use torch.neuron.DataParallel.

torch.neuron.DataParallel implements data parallelism at the module level by duplicating the Neuron model on all available NeuronCores and distributing data across the different cores for parallelized inference.

You can read more about running Inference using torch.neuron.DataParallel here: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuron/api-torch-neuron-dataparallel-api.html#torch-neuron-dataparallel-api

In addition, here is an example of using DataParallel https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/resnet50.html#Run-Inference-using-torch.neuron.DataParallel

AWS
Chris_T
답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠