Issue with AWS Neuron SDK example on Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) inf1 instance

0

I am currently facing an issue with the AWS Neuron SDK when trying to run the PyTorch example provided in the AWS Neuron GitHub repository on a Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) instance with inf1 type.

I followed the steps outlined in the setup.sh script, but encountered an error when executing the following line of code:

cp -f $(find ./venv -name libtorchneuron.so | grep torch_neuronx) libtorch/lib/

The error message was: "cp: missing destination file operand after 'libtorch/lib/'". It seems that the 'find' command is not returning any results, and so the 'cp' command is not receiving a valid source file to copy.

Additionally, I noticed that the python bert_neuronx/compile.py command was killed, potentially due to a lack of system resources.

This issue is preventing me from successfully completing the setup and running the example. I am unsure if this is due to an error in the setup script, an issue with the instance type, or a problem with the installed packages. Any help you could provide would be greatly appreciated.

Please find the full error message below:

./setup.sh: line 92: 27454 Killed                  python bert_neuronx/compile.py

Thank you for your assistance.

Best regards,

1 Answer
0

It looks like your build failed, meaning the target directory did not exist. Can you confirm the following:

  1. The installation of cargo worked correctly?
  2. You are using a inf1 with a fair amount of memory (either 2xlarge or 6xlarge)
  3. As noted in the tutorial you have ~8.5GB of free space to work with on your disk

If you are still seeing a failure please do a fresh build removing all files and share the full log.

AWS
answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions