Browse through the questions and answers listed below or filter and sort to narrow down your results.
Failure to register V.1.1 AWS deeplens camera on Ubuntu 20.4 LTS
> The problem is related to registering a **AWS DeepLens v1.1 Camera** I am failing to register a AWS Deeplens camera using machines running Ubuntu 20.4 (focal) and Mint 19.3 (tessa) or ubuntu bionic, with the error: **Device not detected**. This is when trying to follow the *Register an AWS DeepLens v1.1 device * steps on the AWS web console. Trying to go via other forums, I am made to believe that there is a need to have the **awscam** package installed on the linux machines. I ama failing to install the package as well, with the error below. Is the a awscam package for debian distributions, and more precisely for Ubuntu 20.4? **sudo apt-get install awscam** ``` sudo apt-get install awscam Reading package lists... Done Building dependency tree Reading state information... Done E: Unable to locate package awscam ``` **lsb_release -a** ``` lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.4 LTS Release: 20.04 Codename: focal ```
Why is the GPU not working out of the box for Deep learning AMI EC2 instance?
I'm having trouble using the GPU for a Deep learning GPU EC2 instance. The specs of the instance are: - Deep Learning AMI GPU PyTorch 1.11.0 (Amazon Linux 2) 20220328 - amazon/Deep Learning AMI GPU PyTorch 1.11.0 (Amazon Linux 2) 20220328 When I log into the instance and I run `nvidia smi`, I get the error: `NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.` Similarly, when I run a pytorch (pre-installed) command to check whether it can see a GPU, it returns False: `(pytorch) [ec2-user@ip-172-31-86-58 ~]$ python3` `Python 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:25:59)` `[GCC 10.3.0] on linux` `Type "help", "copyright", "credits" or "license" for more information.` `>>> import torch` `>>> torch.cuda.is_available()` `False` The GPU set up should have worked out of the box but how do I fix this?
Want to create g5g instance for EC2
I want to create a g5g instance for my EC2 deep learning. However when I choose Ubuntu Server 20.04 LTS (HVM), SSD Volume Type 64bit (x86) g5g familly is not supported. I have switched to Ubuntu Server 20.04 LTS (HVM), SSD Volume Type Arm, g5g family supports it, but when I recreate it, I get the following error message: "The requested configuaration is currently not supported. Please check the documentation for supported configuration". I have switched to AWS Deep Learning AMI (Ubuntu 18.04) but g5g series is not supported. So I don't know how I can create a g5g family instance for EC2. Can anyone help me for this?
Deeplens v1 registration failed
Hello, So I registered my Deeplens v1 camera 6 times already and I still get the same result when in the aws console: registration status failed I have no idea why this is happening since I am following the exact steps in the official documentation or any other registration tutorial video. No answers about such issues were provided here, this is why I am posting this question. What might be the reasons behind the registration failing? I did upload the certificates, provided the roles needed, connected the camera to the internet... The documentation only request me to repeat the registration again, but I've been doing this many times already and nothing was solved
Mining cryptocurrency with AWS
Hi, I recently looked at creating my own VM using: P4d.24xlarge instance type x8 Tesla A100’s 96 CPU’s to help increase my hash rate of mining Ethereum with AWS. I was met with an issue to advise I would not be able to secure this apparently due to the following reason: “EC2 service team has informed, they will not be able to approve this limit increase as per this moment. The reason why the quota increase was denied is related to a rapid increase in the customer's demand for our high-performance GPU instances, over the last few months, that is surpassing our availability.” I was pretty sure Amazon does have this available but due to maybe being a new customer they wouldn’t allow me to use this set up? Maybe I have to build a track record with them or something but it’s really frustrating trying to get any decent GPU along with CPU set up with AWS that’s cost effective to mine crypto with them. Any help or advise on this knowledge would be really appreciated. Thanks Mikey
Setting up data for DeepAR, targets and categories for simultaneous data?
I would like to try out DeepAR for an engineering problem that I have some sensor datasets for, but I am unsure how to set it up for ingestion into DeepAR to get a predictive model. The data is essentially the positions, orientations, and a few other timeseries sensor readings of an assortment of objects (animals, in this case, actually) over time. Data is both noisy and sometimes missing. So, in this case, there are N individuals and for each individual, there are Z variables of interest per individual. None of the variables are "static" (color, size, etc), they are all expected to be time-varying on the same time scale. Ultimately, I would like to try and predict all Z targets for all N individuals. How do I set up the timeseries to feed into DeepAR? The premise is that all these individuals are implicitly interacting in the observed space, so all the target values have some interdependence on each other, which is what I would like to see if DeepAR can take into account to make predictions. Should I be using a category vector of length 2, such that the first cat variable corresponds to the individual, and the second corresponds to one of the variables associated with the individual? Then there would be N*Z targets in my input dataset, each with `cat = [ n , z ]`, where there are N distinct values for n, and z for Z?
Not able to clone a conda environment on deep learning amis
I am trying to clone a conda tensorflow environment in EC2 instance, but I am unable to do so. I need this as I am installing some libraries that are not compatible with the my current environment. I have a working environment now and I want to keep it to run other experiments while I meddle within the cloned version and new library installations. I get below errors while trying to clone ``` conda create --name tensorflow2_p38_clone --clone tensorflow2_p38 Source: /home/ubuntu/anaconda3/envs/tensorflow2_p38 Destination: /home/ubuntu/anaconda3/envs/tensorflow2_p38_clone Packages: 445 Files: 53101 Preparing transaction: done Verifying transaction: / SafetyError: The package for nb_conda located at /home/ubuntu/anaconda3/pkgs/nb_conda-2.2.1-py38h578d9bd_4 appears to be corrupted. The path 'lib/python3.8/site-packages/nb_conda/envmanager.py' has an incorrect size. reported size: 6179 bytes actual size: 6209 bytes ClobberError: This transaction has incompatible packages due to a shared path. packages: conda-forge/linux-64::pyqt5-sip-4.19.18-py38h709712a_8, conda-forge/linux-64::sip-4.19.25-py38h709712a_1 path: 'bin/sip' ClobberError: This transaction has incompatible packages due to a shared path. packages: conda-forge/linux-64::pyqt5-sip-4.19.18-py38h709712a_8, conda-forge/linux-64::sip-4.19.25-py38h709712a_1 path: 'include/python3.8/sip.h' ClobberError: This transaction has incompatible packages due to a shared path. packages: conda-forge/linux-64::pyqt5-sip-4.19.18-py38h709712a_8, conda-forge/linux-64::sip-4.19.25-py38h709712a_1 path: 'lib/python3.8/site-packages/__pycache__/sipconfig.cpython-38.pyc' ClobberError: This transaction has incompatible packages due to a shared path. packages: conda-forge/linux-64::pyqt5-sip-4.19.18-py38h709712a_8, conda-forge/linux-64::sip-4.19.25-py38h709712a_1 path: 'lib/python3.8/site-packages/__pycache__/sipdistutils.cpython-38.pyc' ClobberError: This transaction has incompatible packages due to a shared path. packages: conda-forge/linux-64::pyqt5-sip-4.19.18-py38h709712a_8, conda-forge/linux-64::sip-4.19.25-py38h709712a_1 path: 'lib/python3.8/site-packages/sipconfig.py' ClobberError: This transaction has incompatible packages due to a shared path. packages: conda-forge/linux-64::pyqt5-sip-4.19.18-py38h709712a_8, conda-forge/linux-64::sip-4.19.25-py38h709712a_1 path: 'lib/python3.8/site-packages/sipdistutils.py' ClobberError: This transaction has incompatible packages due to a shared path. packages: conda-forge/noarch::jupyter_client-7.0.6-pyhd8ed1ab_0, conda-forge/noarch::nbclient-0.5.8-pyhd8ed1ab_0 path: 'bin/jupyter-run' done Executing transaction: | For Linux 64, Open MPI is built with CUDA awareness but this support is disabled by default. To enable it, please set the environment variable OMPI_MCA_opal_cuda_support=true before launching your MPI processes. Equivalently, you can set the MCA parameter in the command line: mpiexec --mca opal_cuda_support 1 ... In addition, the UCX support is also built but disabled by default. To enable it, first install UCX (conda install -c conda-forge ucx). Then, set the environment variables OMPI_MCA_pml="ucx" OMPI_MCA_osc="ucx" before launching your MPI processes. Equivalently, you can set the MCA parameters in the command line: mpiexec --mca pml ucx --mca osc ucx ... Note that you might also need to set UCX_MEMTYPE_CACHE=n for CUDA awareness via UCX. Please consult UCX's documentation for detail. - Config option `kernel_spec_manager_class` not recognized by `EnableNBExtensionApp`. Enabling notebook extension jupyter-js-widgets/extension... - Validating: OK \ Enabling nb_conda_kernels... CONDA_PREFIX: /home/ubuntu/anaconda3/envs/tensorflow2_p38_clone Status: enabled / Config option `kernel_spec_manager_class` not recognized by `EnableNBExtensionApp`. Enabling notebook extension nb_conda/main... - Validating: OK Enabling tree extension nb_conda/tree... - Validating: OK Config option `kernel_spec_manager_class` not recognized by `EnableServerExtensionApp`. Enabling: nb_conda - Writing config: /home/ubuntu/anaconda3/envs/tensorflow2_p38_clone/etc/jupyter - Validating... nb_conda 2.2.1 OK ``` I had the same issue previously on different EC2 instance as well which I ignored at the time
How do I make an ECS cluster spawn GPU instances with more root volume than default?
I need to deploy an ML app that needs GPU access for its response times to be acceptable (since it uses some heavy networks that run too slowly on CPU). The app is containerized and uses an nvidia/cuda base image, so that it can make use of its host machine's GPU. The image alone weighs ~10GB, and during startup it pulls several ML models and data which takes up about another ~10GB of disk. We were previously running this app on Elastic Beanstalk, but we realized it doesn't support GPU usage, even if specifying a Deep Learning AMI, so we migrated to ECS, which provides more configurability that the former. However, we soon ran into a new problem: **selecting a g4dn instance type when creating a cluster, which defaults the AMI to an ECS GPU one, turns the Root EBS Volume Size field into a Data EBS Volume Size field.** This causes the instance's 22GB root volume (which is the only one that comes formatted and mounted) to be too small for pulling our image and downloading the data it needs during startup. The other volume (of whatever size I specify during creation in the new Data EBS Volume Size field) is not mounted and therefore not accessible by the container. Additionally, the g4dn instances come with a 125GB SSD, that is not mounted either. If either of these were usable or it was possible to enlarge the root volume (which it is if using the default non-GPU AMI) ECS would be the perfect solution for us at this time. At the moment, we worked around this issue by creating an *empty* cluster in ECS, and the manually creating and attaching an Auto Scaling group to it, since when using a Launch configuration or template the root volume's size can be correctly specified, even if using the same exact ECS GPU AMI as ECS does. However, this is a tiresome process, and makes us lose valuable ECS functionality such as automatically spawning a new instance during a rolling update to maintain capacity. Am I missing something here? Is this a bug that will be fixed at some point? If its not, is there a simpler way to achieve what I need? Maybe by specifying a custom launch configuration to the ECS cluster or by automatically mounting the SSD on instance launch? Any help is more than appreciated. Thanks in advance!
Custom AMI with nvidia driver via packer.
Hello, I have a curious problem that I can seem to make heads or tails of. I am using packer to create a custom AMI that includes the nvidia driver. I am using the base ubuntu 16.04 image on a g4dn.xlarge instance as the packer base and packer completes successfully. It installs the nvidia driver using the NVIDIA-Linux-x86_64-440.33.01.run package. Everything looks good, but when I provision a new ec2 instance from this ami, the nvidia driver is not loaded and I get "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running." when running nvidia-smi. I have tried adding extra steps to the packer project that follow the steps outlined here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-nvidia-driver.html#public-nvidia-driver, but the result is always the same. What I am missing here? Thanks Jeremy