- Newest
- Most votes
- Most comments
Hello there,
I understand that you launched the AWS Deep Learning AMI GPU TensorFlow 2.10.0 (Amazon Linux 2) 20220927 with instance type g4dn.xlarge. This was done successfully. Upon ssh-ing into your instance you ran the following in the terminal:
$/usr/local/bin/python3.9 -c "import tensorflow"
This was to verify that TensorFlow is installed on this type of instance. However, doing so resulted in errors as in your Re:Post question. Please let me know if I have miss understood anything.
I did some research on this and found out this is a common issue with TensorFlow version 2.10 [1]. Starting from TensorFlow version 2.10 Linux CPU builds for Aarch64/ARM64 processors are built and maintained by AWS [2]. Installing TensorFlow in these machines installs tensorflow-cpu-aws by default [3] which is already installed in your instance. TensorFlow fails to load on older GPUs when CUDA_FORCE_PTX_JIT=1 is set [4] this may be the case for this type of instance. The error about plugin cuBLAS and warnings that you got when importing TensorFlow do not seem to be preventing you to proceed using TensorFlow [1].
- Install Miniconda
$curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o Miniconda3-latest-Linux-x86_64.sh
$bash Miniconda3-latest-Linux-x86_64.sh
- Create a conda environment and activate the environment
$conda create --name tf_env python=3.9
$conda activate tf_env
- Install TensorFlow
$pip install tensorflow=2.9.2
I hope you found the provided information helpful and thank you again for reaching out to Premium Support. Should you have any further questions or require any additional assistance, please feel free to reach out and I will be more than happy to assist.
Resources
- https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras
- https://blog.tensorflow.org/2022/09/announcing-tensorflow-official-build-collaborators.html
- https://pypi.org/project/tensorflow-cpu-aws/
- https://github.com/tensorflow/tensorflow/issues/57679
- https://www.tensorflow.org/install/pipTensorF
- https://docs.conda.io/en/latest/miniconda.html
- https://docs.aws.amazon.com/sagemaker/latest/dg/nbi-add-external.html
Hi
Thank you for reaching out to us.
I understand that you are using the g4dn.xlarge instance type with the Deep Learning AMI GPU TensorFlow 2.10.0 (Amazon Linux 2) and having issues using the CuDNN Library and would like to confirm if it is part of the DLAMI
In order to deep-dive further I would request you to confirm the following toQuery AMI-ID with AWSCLI:
Query AMI-ID with AWSCLI (example region is us-east-1): aws ec2 describe-images --region us-east-1 --owners amazon --filters 'Name=name,Values=Deep Learning AMI GPU TensorFlow 2.10.? (Amazon Linux 2) ????????' 'Name=state,Values=available' --query 'reverse(sort_by(Images, &CreationDate))[:1].ImageId' --output text
Requesting you to review the releasenotes of the aws-deep-learning-ami-gpu-tensorflow-2-10-amazon-linux-2 and confirm if you are using the latest version and open a AWS support ticket if you need further guidance.
Due to security reason, this post is not suitable for sharing customer's resource.
If you have other questions or require any further clarifications please don't hesitate to open a support ticket with the AWS premium support and we would be glad to assist you on the issue for further investigation
Reference:
https://aws.amazon.com/releasenotes/aws-deep-learning-ami-gpu-tensorflow-2-10-amazon-linux-2/
Relevant content
- asked 2 years ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated 9 months ago
- AWS OFFICIALUpdated 2 years ago
Hi Cebz,
Your suggestion gets me a working version of Tensorflow 2.9.2, in fact, there is no need to even use conda since downgrading to tensorflow 2.9.2 works. But I do not want to use tensorflow 2.9.2. If I did, I would have selected the Tensorflow 2.9.2 AMI's available. For my task, I require some of the features released in 2.10.0. I have attempted the conda solution for Tensorflow 2.10.0, and it did not work.
You are correct that the error does not prevent me from using TensorFlow, but it does prevent me from using the GPU with TensorFlow. If I do not use an GPU for my task, it would take weeks to complete training the Model. GPU usage is not optional.
I have attempting using Tensorflow 2.10.0 on the AWS Deep Learning AMI GPU TensorFlow 2.10.0 using the following instance types: g4dn.xlarge (NVIDIA T4 GPU), g5.xlarge (NVIDIA A10G Tensor Core GPU) , and a p3.2xlarge (NVIDIA Tesla V100 GPU). None of these are "older GPUs" especially not the V100.
All encountered the exact same issue. I am stunned that the GPU accelerating computing instances advertised for ML cannot utilize their accelerators for ML tasks using the latest Amazon ML AMI.
I would appreciate further assistance in this matter.