RuntimeError: The PyTorch Neuron Runtime could not be initialized
0
I just started using Neuron on Inf1 and I'm following the examples. I did the resnet50 example, no problems. Then I tried to follow the BERT example and I got the following error. I followed these troubleshooting steps - Neuron is installed and I haven't seen any of the other errors.
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_27071/3834338657.py in <module>
35
36 # Verify the TorchScript works on both example inputs
---> 37 paraphrase_classification_logits_neuron = model_neuron(*example_inputs_paraphrase)
38 not_paraphrase_classification_logits_neuron = model_neuron(*example_inputs_not_paraphrase)
39
~/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/decorators.py(372): forward
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py(1098): _slow_forward
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py(1110): _call_impl
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/graph.py(548): __call__
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/graph.py(207): run_op
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/graph.py(196): __call__
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/runtime.py(69): forward
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py(1098): _slow_forward
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch/nn/modules/module.py(1110): _call_impl
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch/jit/_trace.py(965): trace_module
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch/jit/_trace.py(750): trace
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/tensorboard.py(307): tb_parse
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/tensorboard.py(533): tb_graph
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/decorators.py(482): maybe_generate_tb_graph_def
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py(513): maybe_determine_names_from_tensorboard
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/torch_neuron/convert.py(200): trace
/tmp/ipykernel_27071/3834338657.py(34): <module>
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3553): run_code
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3473): run_ast_nodes
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3258): run_cell_async
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/IPython/core/async_helpers.py(78): _pseudo_sync_runner
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(3030): _run_cell
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/IPython/core/interactiveshell.py(2976): run_cell
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/zmqshell.py(528): run_cell
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/ipkernel.py(387): do_execute
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(730): execute_request
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(406): dispatch_shell
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(499): process_one
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelbase.py(510): dispatch_queue
/usr/lib64/python3.7/asyncio/events.py(88): _run
/usr/lib64/python3.7/asyncio/base_events.py(1786): _run_once
/usr/lib64/python3.7/asyncio/base_events.py(541): run_forever
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/tornado/platform/asyncio.py(215): start
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel/kernelapp.py(712): start
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/traitlets/config/application.py(982): launch_instance
/home/ec2-user/pytorch_venv/lib64/python3.7/site-packages/ipykernel_launcher.py(17): <module>
/usr/lib64/python3.7/runpy.py(85): _run_code
/usr/lib64/python3.7/runpy.py(193): _run_module_as_main
RuntimeError: The PyTorch Neuron Runtime could not be initialized. Neuron Driver issues are logged
to your system logs. See the Neuron Runtime's troubleshooting guide for help on this
topic: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/
When I searched the system logs from EC2 > Instances > ... > Get system log for "neuron" I got these results:
[ 264.104973] neuron: loading out-of-tree module taints kernel.
[ 264.107388] neuron: module verification failed: signature and/or required key missing - tainting kernel
[ 264.113673] Neuron Driver Started with Version:2.3.26.0-67ad286904ed6cc43a8761d89c8477de0ba961e1
[ 264.119896] neuron:nr_reset_thread_fn: nd0: initiating reset
[ 264.139198] neuron:mpset_constructor: reserved 134217728 bytes of host memory
[ 271.253090] neuron:nr_reset_thread_fn: nd0: reset completed
[ 2010.629871] neuron:npid_attach: neuron:npid_attach: pid=25574, slot=0
[ 2069.531541] neuron:npid_detach: neuron:npid_detach: pid=25574, slot=0
[ 2093.038724] neuron:npid_attach: neuron:npid_attach: pid=25666, slot=0
[ 2266.011350] neuron:npid_detach: neuron:npid_detach: pid=25666, slot=0
[ 2271.008477] neuron:npid_attach: neuron:npid_attach: pid=25777, slot=0
[ 4139.513592] neuron:npid_detach: neuron:npid_detach: pid=25777, slot=0
preguntada hace 2 años930 visualizacioneslg...
1 Respuesta
- Más nuevo
- Más votos
- Más comentarios
¿Son útiles estas respuestas? Vote a favor de la respuesta correcta para ayudar a la comunidad a beneficiarse de sus conocimientos.
0
Respuesta aceptada
I saw this output from the jupyter process when I executed the specific cell from the notebook, that runs the compiled script:
ERROR NRT:nrt_allocate_neuron_cores NeuronCore(s) not available - Requested:4 Available:0
The problem was that there was already a notebook running from the previous example that I followed.
I feel silly now :)
respondido hace 2 añoslg...
Contenido relevante
- OFICIAL DE AWSActualizada hace 2 años
- OFICIAL DE AWSActualizada hace 2 años
- OFICIAL DE AWSActualizada hace 2 años