ModuleNotFoundError: No module named 'nvgpu' in sagemaker batch transform

1

I am trying to do batch transform inference with a ml.g4dn.xlarge instance using a GPU. However, when I run the inference, I get ModuleNotFoundError: No module named 'nvgpu' I tried to add the nvgpu library to the requirements.txt file of the model when training, but i have not found a version that works. I also read here : https://github.com/pytorch/serve/issues/1813#issuecomment-1231025086 that the issue could be the framework_version, but I have tried using 1.9.0 like it suggest but the issue remains. Any idea what could be the issue or what version of nvgpu to add to requirements?

Alex
asked a year ago1275 views
2 Answers
0

Hi Alex,

I just tried to replicate the same using the following example,

https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-python-sdk/pytorch_batch_inference/sagemaker_batch_inference_torchserve.ipynb

I did not face this issue. I have tried the framework versions “1.9.0” and "1.13.1” with the instance type “ml.g4dn.xlarge”. Can you try using the more recent framework versions starting with 1.13.1?

If you are still facing the issue, please share the code example that you are following at your end.

AWS
answered a year ago
profile picture
EXPERT
reviewed a month ago
0

I am also encountering this issue with py_version="py38" and framework_version="1.12" with ml.p2.xlarge. Any solution would be appreciated.

Samuel
answered 4 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions