RuntimeError: The PyTorch Neuron Runtime could not be initialized

0

I just started using Neuron on Inf1 and I'm following the examples. I did the resnet50 example, no problems. Then I tried to follow the BERT example and I got the following error. I followed these troubleshooting steps - Neuron is installed and I haven't seen any of the other errors.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_27071/3834338657.py in <module>
     35 
     36 # Verify the TorchScript works on both example inputs
---> 37 paraphrase_classification_logits_neuron = model_neuron(*example_inputs_paraphrase)
     38 not_paraphrase_classification_logits_neuron = model_neuron(*example_inputs_not_paraphrase)
     39 

~/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/decorators.py(372): forward
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py(1098): _slow_forward
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py(1110): _call_impl
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/graph.py(548): __call__
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/graph.py(207): run_op
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/graph.py(196): __call__
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/runtime.py(69): forward
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py(1098): _slow_forward
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py(1110): _call_impl
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch/jit/_trace.py(965): trace_module
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch/jit/_trace.py(750): trace
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/tensorboard.py(307): tb_parse
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/tensorboard.py(533): tb_graph
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/decorators.py(482): maybe_generate_tb_graph_def
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py(513): maybe_determine_names_from_tensorboard
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py(200): trace
/tmp/ipykernel_27071/3834338657.py(34): <module>
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3553): run_code
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3473): run_ast_nodes
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3258): run_cell_async
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/IPython/core/async_helpers.py(78): _pseudo_sync_runner
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3030): _run_cell
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(2976): run_cell
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/zmqshell.py(528): run_cell
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/ipkernel.py(387): do_execute
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(730): execute_request
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(406): dispatch_shell
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(499): process_one
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(510): dispatch_queue
/usr/lib64/python3.7/asyncio/events.py(88): _run
/usr/lib64/python3.7/asyncio/base_events.py(1786): _run_once
/usr/lib64/python3.7/asyncio/base_events.py(541): run_forever
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/tornado/platform/asyncio.py(215): start
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelapp.py(712): start
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/traitlets/config/application.py(982): launch_instance
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel_launcher.py(17): <module>
/usr/lib64/python3.7/runpy.py(85): _run_code
/usr/lib64/python3.7/runpy.py(193): _run_module_as_main
RuntimeError: The PyTorch Neuron Runtime could not be initialized. Neuron Driver issues are logged
to your system logs. See the Neuron Runtime's troubleshooting guide for help on this
topic: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/

When I searched the system logs from EC2 > Instances > ... > Get system log for "neuron" I got these results:

[  264.104973] neuron: loading out-of-tree module taints kernel.
[  264.107388] neuron: module verification failed: signature and/or required key missing - tainting kernel
[  264.113673] Neuron Driver Started with Version:2.3.26.0-67ad286904ed6cc43a8761d89c8477de0ba961e1
[  264.119896] neuron:nr_reset_thread_fn: nd0: initiating reset
[  264.139198] neuron:mpset_constructor: reserved 134217728 bytes of host memory
[  271.253090] neuron:nr_reset_thread_fn: nd0: reset completed
[ 2010.629871] neuron:npid_attach: neuron:npid_attach: pid=25574, slot=0
[ 2069.531541] neuron:npid_detach: neuron:npid_detach: pid=25574, slot=0
[ 2093.038724] neuron:npid_attach: neuron:npid_attach: pid=25666, slot=0
[ 2266.011350] neuron:npid_detach: neuron:npid_detach: pid=25666, slot=0
[ 2271.008477] neuron:npid_attach: neuron:npid_attach: pid=25777, slot=0
[ 4139.513592] neuron:npid_detach: neuron:npid_detach: pid=25777, slot=0
nikolay
gefragt vor einem Jahr642 Aufrufe
1 Antwort
0
Akzeptierte Antwort

I saw this output from the jupyter process when I executed the specific cell from the notebook, that runs the compiled script:

ERROR   NRT:nrt_allocate_neuron_cores               NeuronCore(s) not available - Requested:4 Available:0

The problem was that there was already a notebook running from the previous example that I followed.

I feel silly now :)

nikolay
beantwortet vor einem Jahr

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen