I am trying to add a temporary volume to my Batch process for a machine learning application.
I have created a Launch Template with a 104G drive and specified this, along with the appropriate amazon ML docker image in the compute environment.
Instance types: optimal p3.2xlarge
Ec2 configuration: ECS_AL2_NVIDIA
The instance launches well enough, and the launch template seems to be respected, however I do not know how to use the additional drive.
ls -al /dev
drwxr-xr-x 5 root root 380 Jan 23 03:00 .
drwxr-xr-x 1 root root 101 Jan 23 03:00 ..
lrwxrwxrwx 1 root root 11 Jan 23 03:00 core -> /proc/kcore
lrwxrwxrwx 1 root root 13 Jan 23 03:00 fd -> /proc/self/fd
crw-rw-rw- 1 root root 1, 7 Jan 23 03:00 full
drwxrwxrwt 2 root root 40 Jan 23 03:00 mqueue
crw-rw-rw- 1 root root 1, 3 Jan 23 03:00 null
crw-rw-rw- 1 root root 195, 0 Jan 23 02:58 nvidia0
crw-rw-rw- 1 root root 195, 255 Jan 23 02:58 nvidiactl
lrwxrwxrwx 1 root root 8 Jan 23 03:00 ptmx -> pts/ptmx
drwxr-xr-x 2 root root 0 Jan 23 03:00 pts
crw-rw-rw- 1 root root 1, 8 Jan 23 03:00 random
drwxrwxrwt 2 root root 40 Jan 23 03:00 shm
lrwxrwxrwx 1 root root 15 Jan 23 03:00 stderr -> /proc/self/fd/2
lrwxrwxrwx 1 root root 15 Jan 23 03:00 stdin -> /proc/self/fd/0
lrwxrwxrwx 1 root root 15 Jan 23 03:00 stdout -> /proc/self/fd/1
crw-rw-rw- 1 root root 5, 0 Jan 23 03:00 tty
crw-rw-rw- 1 root root 1, 9 Jan 23 03:00 urandom
crw-rw-rw- 1 root root 1, 5 Jan 23 03:00 zero
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 30G 0 disk
└─xvda1 202:1 0 30G 0 part /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.470.57.02
xvdb 202:16 0 104G 0 disk
The 104G disk is showing up as "xvdb" but there is no associated device file so I do not know how to use any of the mounting or formatting tools.
I have tried changing the names and dive types in the launch template, but get exactly the same problem. I have tried adding random names and sources to the job definition "Volumes configuration" but this also does not help.
How do I access this drive? Or should I take a different approach?