neuron-monitor fails with ""error":"unable to get ultraserver status: -2" on ec2

0

I installed "Deep Learning AMI Neuron Ubuntu (22.04)" on an EC2 g4dn.xlarge instance. After I login to the instance, I run the "neuron-monitor" command, and I get the following error: "error":"unable to get ultraserver status: -2"} "error":"no Neuron Device found - make sure the aws-neuronx-dmks package is installed"}

I verified that the aws-neuronx-dmks package is installed, but I still can't see the output of the neuron-monitor command.

I also tried to install "Deep Learning AMI Neuron (Amaon Linux 2023), but I have the same issue as with Ubuntu.

Thank you for your assistance!

1 Answer
1
Accepted Answer

It appears that you're encountering an issue because you're trying to use Neuron tools on an instance type that doesn't support them. The g4dn.xlarge instance you're using is not a Neuron-enabled instance.

Neuron is specifically designed for AWS Inferentia and AWS Trainium chips, which are found in Inf1, Inf2, Trn1, and Trn2 instance types. The g4dn.xlarge instance you're using has NVIDIA GPUs, not AWS Inferentia or Trainium chips, which is why you're seeing the "no Neuron Device found" error.

To use Neuron tools like neuron-monitor, you need to launch an instance with one of the following instance types:

  1. Inf1 (for inference workloads with Inferentia chips)
  2. Inf2 (for inference workloads with Inferentia2 chips)
  3. Trn1 or Trn1n (for training workloads with first-generation Trainium chips)
  4. Trn2 (for training workloads with second-generation Trainium chips)

When you're launching an EC2 instance, you need to select one of these instance types instead of g4dn.xlarge. The AMI (Amazon Machine Image) you choose is separate from the instance type selection. You've correctly chosen a Neuron-enabled AMI, but you also need to select a Neuron-compatible instance type.

To resolve this issue:

  1. Terminate your current g4dn.xlarge instance.
  2. Launch a new EC2 instance, choosing one of the Inf1, Inf2, Trn1, or Trn2 instance types.
  3. Use the same Neuron-enabled AMI you've been using (either the Ubuntu 22.04 or Amazon Linux 2023 version).

Once you've launched an instance with the correct instance type, the neuron-monitor command should work as expected.

Remember, Neuron tools and the associated AMIs are designed to work with AWS's custom AI/ML chips (Inferentia and Trainium), not with GPU-based instances like g4dn.
Sources
Neuron Runtime no longer supports Amazon Linux 2 (AL2) — AWS Neuron Documentation
‘neuron-device-version’ field in neuron-monitor no longer supported — AWS Neuron Documentation

profile picture
answered a month ago
profile pictureAWS
EXPERT
reviewed 19 days ago
profile pictureAWS
EXPERT
reviewed a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions