"context deadline exceeded" when using Neuron CLI tools

0

Hello,

Since I rebooted my Inf1.xlarge instance, I am getting as output "context deadline exceeded" when running commands "neuron-cli list-model" or "neuron-top". These commands are very helpful for debugging and profiling performance of my system. I am unaware of any account limits related to Neuron SDK and Inferentia instances. Does anyone have an explanation for that?

Regards

asked 3 years ago461 views
3 Answers
0

Hello javinewtral,

This error message usually indicates that the Neuron runtime daemon is not running.

First, can you try the following to verify that the 'aws-neuron-dkms' package is installed:

**sudo apt list | grep neuron**  

Assuming it is installed, can you run the following to make sure that the daemon service is started:

**sudo systemctl status neuron-rtd**  

If the service is not running, you can restart it by running:

**sudo systemctl restart neuron-rtd**  

Let us know if that works. If the problem persists, then we may need to dive deeper into the logs to understand the failure.

Regards,
Taylor

Edited by: Taylor@AWS on Apr 26, 2021 8:36 AM

AWS
answered 3 years ago
0

Looks like there is a related issue that results in the same error message.
The OS auto upgrade is dropping the Inferentia kernel modules.

Command lines tools, for example neuron-top return with "context deadline exceeded".

neuron-ls reports the wrong number of "NEURON CORES" but appears to run!

The auto upgrade from 5.4.0-1049-aws to 5.4.0-1051-aws triggers the problem for me.
I'm using Deep Learning AMI (Ubuntu 18.04)

answered 3 years ago
0

caffeinate wrote:
Looks like there is a related issue that results in the same error message.
The OS auto upgrade is dropping the Inferentia kernel modules.

aws-neuron-dkms is a special kernel module package that has a dependency on the linux kernel version. This means that when the package is installed through apt, it will only be compatible with the linux kernel version that was running on the instance during the installation.

You have to re-install this package if you change the linux kernel version. The current kernel version can be checked using uname -r, and a list of the kernels that have dkms installed can be checked with dkms status | grep aws-neuron. Refer to the https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-runtime/nrt-troubleshoot.html#neuron-driver-installation-fails for steps on how to re-install aws-neuron-dkms on a new kernel.

Command lines tools, for example neuron-top return with "context deadline exceeded".

neuron-ls reports the wrong number of "NEURON CORES" but appears to run!

That's rather interesting. Do you mind sharing the output neuron-ls is giving you and the output of apt list | grep aws-neuron ? If we can repro this neuron-ls problem internally, we will likely fix it before the next release goes out. Thanks!

answered 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions