The environment I'm using is:
- aws p4dn.24xlarge instance (NVIDIA Ampere A100 GPU )
- cuda 10.1
- tensorflow 2.3.0
- python 3.6.9
I get an error when I run the following. What is the reason?
tensorflow.test.is_gpu_available()
2022-01-23 07:56:08.088849: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:10:1c.0 name: A100-SXM4-40GB computeCapability: 8.0 coreClock: 1.41GHz coreCount: 108 deviceMemorySize: 39.59GiB deviceMemoryBandwidth: 1.41TiB/s 2022-01-23 07:56:08.088936: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 2022-01-23 07:56:08.089013: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10 2022-01-23 07:56:08.089030: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10 2022-01-23 07:56:08.089046: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10 2022-01-23 07:56:08.089059: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10 2022-01-23 07:56:08.089074: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10 2022-01-23 07:56:08.089090: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7 2022-01-23 07:56:08.092700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0 Traceback (most recent call last): File "", line 1, in File "/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func return func(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/framework/test_util.py", line 1563, in is_gpu_available for local_device in device_lib.list_local_devices(): File "/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/client/device_lib.py", line 43, in list_local_devices _convert(s) for s in _pywrap_device_lib.list_devices(serialized_config) RuntimeError: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid