gptj_demo compilation failed on Inf2

0

Hi, I'm trying to run the gptj_demo on Inf2 with AMI Deep Learning AMI Neuron PyTorch 1.13.0 (Ubuntu 20.04) 20230405 and installed the pytorch neuron as https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-install.html#pytorch-neuronx-install.

While running: (aws_neuron_venv_pytorch) ubuntu@ip-172-31-32-224:~$ gptj_demo run gpt-j-6B-split , I got the following exception: 2023-04-26T01:31:26Z ERROR 41469 [WalrusDriver]: Walrus pass: birverifier failed! 2023-04-26T01:31:26Z ERROR 41469 [WalrusDriver]: Failure Reason: === BIR verification failed === Reason: Expect memory location to be of type SB Instruction: I-26932 Opcode: IndirectSave Input index: 1 Argument AP: Access Pattern: [[512,4],[512,1],[1,512]] SymbolicAP Memory Location: {_reshape_382_hlo_id_3947__mhlo.reshape_32_pftranspose_12031_set}@PSUM 2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: *************************************************************** 2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: An Internal Compiler Error has occurred 2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: *************************************************************** 2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: 2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: Error message: Walrus driver failed to complete 2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: 2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: Error class: AssertionError 2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: Error location: Unknown 2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: Command line: /opt/aws_neuron_venv_pytorch/bin/neuronx-cc compile --framework=XLA --target=trn1 /tmp/tmpd5jgl51u/hlo_module.pb --output=/tmp/tmpd5jgl51u/hlo_module.pb.neff --verbose=35

2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: Version information: 2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: NeuronX Compiler version 2.5.0.28+1be23f232 2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: 2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: HWM version 2.5.0.0-dad732dd6 2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: NEFF version Dynamic 2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: TVM not available 2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: NumPy version 1.21.6 2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: MXNet not available 2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: 2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: Artifacts stored in: /tmp/tmpd5jgl51u/neuronxcc-ng2z05sr

I'm not sure whether it's related to the --target=trn1 which seems to be hard coded in: https://github.com/aws-neuron/transformers-neuronx/blob/1e72ddc31976925ba0c79e2ff12301ff3bd6b920/src/transformers_neuronx/compiler.py#L59

Thanks.

AWS
gefragt vor einem Jahr406 Aufrufe
1 Antwort
0

We are unable reproduce the issue using the packages in the same AMI you used with the latest transformers-neuronx and latest transfomers installed using "!pip install git+https://github.com/aws-neuron/transformers-neuronx.git transformers -U" . If you still having further issues, please share steps to produce the issue including setup commands and script to run in a GitHub issue via https://github.com/aws-neuron/aws-neuron-sdk.

beantwortet vor einem Jahr

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen