RuntimeError: The PyTorch Neuron Runtime could not be initialized

0

I just started using Neuron on Inf1 and I'm following the examples. I did the resnet50 example, no problems. Then I tried to follow the BERT example and I got the following error. I followed these troubleshooting steps - Neuron is installed and I haven't seen any of the other errors.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_27071/3834338657.py in <module>
     35 
     36 # Verify the TorchScript works on both example inputs
---> 37 paraphrase_classification_logits_neuron = model_neuron(*example_inputs_paraphrase)
     38 not_paraphrase_classification_logits_neuron = model_neuron(*example_inputs_not_paraphrase)
     39 

~/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/decorators.py(372): forward
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py(1098): _slow_forward
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py(1110): _call_impl
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/graph.py(548): __call__
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/graph.py(207): run_op
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/graph.py(196): __call__
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/runtime.py(69): forward
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py(1098): _slow_forward
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py(1110): _call_impl
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch/jit/_trace.py(965): trace_module
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch/jit/_trace.py(750): trace
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/tensorboard.py(307): tb_parse
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/tensorboard.py(533): tb_graph
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/decorators.py(482): maybe_generate_tb_graph_def
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py(513): maybe_determine_names_from_tensorboard
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py(200): trace
/tmp/ipykernel_27071/3834338657.py(34): <module>
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3553): run_code
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3473): run_ast_nodes
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3258): run_cell_async
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/IPython/core/async_helpers.py(78): _pseudo_sync_runner
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3030): _run_cell
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(2976): run_cell
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/zmqshell.py(528): run_cell
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/ipkernel.py(387): do_execute
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(730): execute_request
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(406): dispatch_shell
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(499): process_one
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(510): dispatch_queue
/usr/lib64/python3.7/asyncio/events.py(88): _run
/usr/lib64/python3.7/asyncio/base_events.py(1786): _run_once
/usr/lib64/python3.7/asyncio/base_events.py(541): run_forever
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/tornado/platform/asyncio.py(215): start
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelapp.py(712): start
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/traitlets/config/application.py(982): launch_instance
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel_launcher.py(17): <module>
/usr/lib64/python3.7/runpy.py(85): _run_code
/usr/lib64/python3.7/runpy.py(193): _run_module_as_main
RuntimeError: The PyTorch Neuron Runtime could not be initialized. Neuron Driver issues are logged
to your system logs. See the Neuron Runtime's troubleshooting guide for help on this
topic: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/

When I searched the system logs from EC2 > Instances > ... > Get system log for "neuron" I got these results:

[  264.104973] neuron: loading out-of-tree module taints kernel.
[  264.107388] neuron: module verification failed: signature and/or required key missing - tainting kernel
[  264.113673] Neuron Driver Started with Version:2.3.26.0-67ad286904ed6cc43a8761d89c8477de0ba961e1
[  264.119896] neuron:nr_reset_thread_fn: nd0: initiating reset
[  264.139198] neuron:mpset_constructor: reserved 134217728 bytes of host memory
[  271.253090] neuron:nr_reset_thread_fn: nd0: reset completed
[ 2010.629871] neuron:npid_attach: neuron:npid_attach: pid=25574, slot=0
[ 2069.531541] neuron:npid_detach: neuron:npid_detach: pid=25574, slot=0
[ 2093.038724] neuron:npid_attach: neuron:npid_attach: pid=25666, slot=0
[ 2266.011350] neuron:npid_detach: neuron:npid_detach: pid=25666, slot=0
[ 2271.008477] neuron:npid_attach: neuron:npid_attach: pid=25777, slot=0
[ 4139.513592] neuron:npid_detach: neuron:npid_detach: pid=25777, slot=0
nikolay
preguntada hace 2 años930 visualizaciones
1 Respuesta
0
Respuesta aceptada

I saw this output from the jupyter process when I executed the specific cell from the notebook, that runs the compiled script:

ERROR   NRT:nrt_allocate_neuron_cores               NeuronCore(s) not available - Requested:4 Available:0

The problem was that there was already a notebook running from the previous example that I followed.

I feel silly now :)

nikolay
respondido hace 2 años

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas