Hi, I'm trying to run the gptj_demo on Inf2 with AMI Deep Learning AMI Neuron PyTorch 1.13.0 (Ubuntu 20.04) 20230405 and installed the pytorch neuron as https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-install.html#pytorch-neuronx-install.
While running: (aws_neuron_venv_pytorch) ubuntu@ip-172-31-32-224:~$ gptj_demo run gpt-j-6B-split , I got the following exception:
2023-04-26T01:31:26Z ERROR 41469 [WalrusDriver]: Walrus pass: birverifier failed!
2023-04-26T01:31:26Z ERROR 41469 [WalrusDriver]: Failure Reason: === BIR verification failed ===
Reason: Expect memory location to be of type SB
Instruction: I-26932
Opcode: IndirectSave
Input index: 1
Argument AP:
Access Pattern: [[512,4],[512,1],[1,512]]
SymbolicAP
Memory Location: {_reshape_382_hlo_id_3947__mhlo.reshape_32_pftranspose_12031_set}@PSUM
2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: ***************************************************************
2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: An Internal Compiler Error has occurred
2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: ***************************************************************
2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]:
2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: Error message: Walrus driver failed to complete
2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]:
2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: Error class: AssertionError
2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: Error location: Unknown
2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: Command line: /opt/aws_neuron_venv_pytorch/bin/neuronx-cc compile --framework=XLA --target=trn1 /tmp/tmpd5jgl51u/hlo_module.pb --output=/tmp/tmpd5jgl51u/hlo_module.pb.neff --verbose=35
2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: Version information:
2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: NeuronX Compiler version 2.5.0.28+1be23f232
2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]:
2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: HWM version 2.5.0.0-dad732dd6
2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: NEFF version Dynamic
2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: TVM not available
2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: NumPy version 1.21.6
2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: MXNet not available
2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]:
2023-04-26T01:31:26Z ERROR 41469 [neuronx-cc]: Artifacts stored in: /tmp/tmpd5jgl51u/neuronxcc-ng2z05sr
I'm not sure whether it's related to the --target=trn1 which seems to be hard coded in:
https://github.com/aws-neuron/transformers-neuronx/blob/1e72ddc31976925ba0c79e2ff12301ff3bd6b920/src/transformers_neuronx/compiler.py#L59
Thanks.