By using AWS re:Post, you agree to the Terms of Use

Questions tagged with AWS Deep Learning AMIs

Sort by most recent
  • 1
  • 2
  • 12 / page

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Setting up data for DeepAR, targets and categories for simultaneous data?

I would like to try out DeepAR for an engineering problem that I have some sensor datasets for, but I am unsure how to set it up for ingestion into DeepAR to get a predictive model. The data is essentially the positions, orientations, and a few other timeseries sensor readings of an assortment of objects (animals, in this case, actually) over time. Data is both noisy and sometimes missing. So, in this case, there are N individuals and for each individual, there are Z variables of interest per individual. None of the variables are "static" (color, size, etc), they are all expected to be time-varying on the same time scale. Ultimately, I would like to try and predict all Z targets for all N individuals. How do I set up the timeseries to feed into DeepAR? The premise is that all these individuals are implicitly interacting in the observed space, so all the target values have some interdependence on each other, which is what I would like to see if DeepAR can take into account to make predictions. Should I be using a category vector of length 2, such that the first cat variable corresponds to the individual, and the second corresponds to one of the variables associated with the individual? Then there would be N*Z targets in my input dataset, each with `cat = [ n , z ]`, where there are N distinct values for n, and z for Z?
1
answers
0
votes
68
views
asked 7 months ago

Not able to clone a conda environment on deep learning amis

I am trying to clone a conda tensorflow environment in EC2 instance, but I am unable to do so. I need this as I am installing some libraries that are not compatible with the my current environment. I have a working environment now and I want to keep it to run other experiments while I meddle within the cloned version and new library installations. I get below errors while trying to clone ``` conda create --name tensorflow2_p38_clone --clone tensorflow2_p38 Source: /home/ubuntu/anaconda3/envs/tensorflow2_p38 Destination: /home/ubuntu/anaconda3/envs/tensorflow2_p38_clone Packages: 445 Files: 53101 Preparing transaction: done Verifying transaction: / SafetyError: The package for nb_conda located at /home/ubuntu/anaconda3/pkgs/nb_conda-2.2.1-py38h578d9bd_4 appears to be corrupted. The path 'lib/python3.8/site-packages/nb_conda/envmanager.py' has an incorrect size. reported size: 6179 bytes actual size: 6209 bytes ClobberError: This transaction has incompatible packages due to a shared path. packages: conda-forge/linux-64::pyqt5-sip-4.19.18-py38h709712a_8, conda-forge/linux-64::sip-4.19.25-py38h709712a_1 path: 'bin/sip' ClobberError: This transaction has incompatible packages due to a shared path. packages: conda-forge/linux-64::pyqt5-sip-4.19.18-py38h709712a_8, conda-forge/linux-64::sip-4.19.25-py38h709712a_1 path: 'include/python3.8/sip.h' ClobberError: This transaction has incompatible packages due to a shared path. packages: conda-forge/linux-64::pyqt5-sip-4.19.18-py38h709712a_8, conda-forge/linux-64::sip-4.19.25-py38h709712a_1 path: 'lib/python3.8/site-packages/__pycache__/sipconfig.cpython-38.pyc' ClobberError: This transaction has incompatible packages due to a shared path. packages: conda-forge/linux-64::pyqt5-sip-4.19.18-py38h709712a_8, conda-forge/linux-64::sip-4.19.25-py38h709712a_1 path: 'lib/python3.8/site-packages/__pycache__/sipdistutils.cpython-38.pyc' ClobberError: This transaction has incompatible packages due to a shared path. packages: conda-forge/linux-64::pyqt5-sip-4.19.18-py38h709712a_8, conda-forge/linux-64::sip-4.19.25-py38h709712a_1 path: 'lib/python3.8/site-packages/sipconfig.py' ClobberError: This transaction has incompatible packages due to a shared path. packages: conda-forge/linux-64::pyqt5-sip-4.19.18-py38h709712a_8, conda-forge/linux-64::sip-4.19.25-py38h709712a_1 path: 'lib/python3.8/site-packages/sipdistutils.py' ClobberError: This transaction has incompatible packages due to a shared path. packages: conda-forge/noarch::jupyter_client-7.0.6-pyhd8ed1ab_0, conda-forge/noarch::nbclient-0.5.8-pyhd8ed1ab_0 path: 'bin/jupyter-run' done Executing transaction: | For Linux 64, Open MPI is built with CUDA awareness but this support is disabled by default. To enable it, please set the environment variable OMPI_MCA_opal_cuda_support=true before launching your MPI processes. Equivalently, you can set the MCA parameter in the command line: mpiexec --mca opal_cuda_support 1 ... In addition, the UCX support is also built but disabled by default. To enable it, first install UCX (conda install -c conda-forge ucx). Then, set the environment variables OMPI_MCA_pml="ucx" OMPI_MCA_osc="ucx" before launching your MPI processes. Equivalently, you can set the MCA parameters in the command line: mpiexec --mca pml ucx --mca osc ucx ... Note that you might also need to set UCX_MEMTYPE_CACHE=n for CUDA awareness via UCX. Please consult UCX's documentation for detail. - Config option `kernel_spec_manager_class` not recognized by `EnableNBExtensionApp`. Enabling notebook extension jupyter-js-widgets/extension... - Validating: OK \ Enabling nb_conda_kernels... CONDA_PREFIX: /home/ubuntu/anaconda3/envs/tensorflow2_p38_clone Status: enabled / Config option `kernel_spec_manager_class` not recognized by `EnableNBExtensionApp`. Enabling notebook extension nb_conda/main... - Validating: OK Enabling tree extension nb_conda/tree... - Validating: OK Config option `kernel_spec_manager_class` not recognized by `EnableServerExtensionApp`. Enabling: nb_conda - Writing config: /home/ubuntu/anaconda3/envs/tensorflow2_p38_clone/etc/jupyter - Validating... nb_conda 2.2.1 OK ``` I had the same issue previously on different EC2 instance as well which I ignored at the time
0
answers
0
votes
49
views
asked 7 months ago

How do I make an ECS cluster spawn GPU instances with more root volume than default?

I need to deploy an ML app that needs GPU access for its response times to be acceptable (since it uses some heavy networks that run too slowly on CPU). The app is containerized and uses an nvidia/cuda base image, so that it can make use of its host machine's GPU. The image alone weighs ~10GB, and during startup it pulls several ML models and data which takes up about another ~10GB of disk. We were previously running this app on Elastic Beanstalk, but we realized it doesn't support GPU usage, even if specifying a Deep Learning AMI, so we migrated to ECS, which provides more configurability that the former. However, we soon ran into a new problem: **selecting a g4dn instance type when creating a cluster, which defaults the AMI to an ECS GPU one, turns the Root EBS Volume Size field into a Data EBS Volume Size field.** This causes the instance's 22GB root volume (which is the only one that comes formatted and mounted) to be too small for pulling our image and downloading the data it needs during startup. The other volume (of whatever size I specify during creation in the new Data EBS Volume Size field) is not mounted and therefore not accessible by the container. Additionally, the g4dn instances come with a 125GB SSD, that is not mounted either. If either of these were usable or it was possible to enlarge the root volume (which it is if using the default non-GPU AMI) ECS would be the perfect solution for us at this time. At the moment, we worked around this issue by creating an *empty* cluster in ECS, and the manually creating and attaching an Auto Scaling group to it, since when using a Launch configuration or template the root volume's size can be correctly specified, even if using the same exact ECS GPU AMI as ECS does. However, this is a tiresome process, and makes us lose valuable ECS functionality such as automatically spawning a new instance during a rolling update to maintain capacity. Am I missing something here? Is this a bug that will be fixed at some point? If its not, is there a simpler way to achieve what I need? Maybe by specifying a custom launch configuration to the ECS cluster or by automatically mounting the SSD on instance launch? Any help is more than appreciated. Thanks in advance!
0
answers
0
votes
34
views
asked 8 months ago
  • 1
  • 2
  • 12 / page