How do I reset Amazon Sagemaker Studio Lab & wipe all data? ([Errno 28] No space left on device: '). All kernels fail & auto-restart fails.

1

Sagemaker Studio Lab has stopped working. First, all kernels fail with unknown error and can't restart. OR get stuck in a loop restarting themselves. I am unable to make a new kernel. Second, this error keeps popping up: "Unexpected error while saving file: aws/notebooks/MLA-TAB-DAY3-AUTOML.ipynb [Errno 28] No space left on device: '"

And when checking resources it shows that 100% of storage is used at /hom/studio-lab-user but I am unable to delete any of the files in that directory or the directory itself.

/dev/nvme0n1p1 50G 17G 34G 33% /opt/.sagemakerinternal

/dev/nvme1n1 25G 25G 36K 100% /home/studio-lab-user

How can I just reset the entire Sagemaker Studio Lab profile to default, erase all data that's clogging up storage/memory, and start fresh?

feita há 2 anos5075 visualizações
3 Respostas
1

Hello, If you get a notification that your disk space is full while you're attempting to create or import a file, you can delete files to increase space.

To remove all of your files and reset your project, run the following command from the terminal.

rm -rf .

The following command deletes a conda environment from your project.

conda remove --name <ENVIRONMENT_NAME> --all

AWS
respondido há 2 anos
0

The command below removes all the files from the SageMaker studio lab instance as stated by Aleksandr_P

rm -rf /home/studio-lab-user/

However, this command is not suitable as it deletes some system packages such as the clear command:

(studiolab) studio-lab-user@default:~/sagemaker-studiolab-notebooks$ clear
bash: /home/studio-lab-user/.conda/envs/studiolab/bin/clear: No such file or directory
(studiolab) studio-lab-user@default:~/sagemaker-studiolab-notebooks$ 

The clear command cannot be identified.

respondido há um ano
  • One can restart the instance in order to restore the lost packages.

0

Probably safe to say that the best way depends on what you were doing with the instance. Personally I'd dig into the file system a bit and see what is eating up your disk space and possibly remove unneeded files. I'm sure there are more "nuclear" options that completely wipe and restart the instance, the following assumes you want to keep the instance running...

For example, I was running some ML workflows in a notebook which involved a lot of model downloads that ate up disk space. I ran out of space pretty quickly. So I first checkout to see if there were temp files accumulating (I knew there were, just wasn't sure where).

cd /home/studio-lab-user
ls -la

The second command lists all files and directories including hidden ones starting with ., which revealed some application-specific cache directories, conda envs, etc. Then

du -h -d1

to display "human-readable disk usage one level down from current directory". In my case, there was a temporary .cache directory using 20Gb that I didn't need, so I removed it.

rm -rf .cache

Hope this helps!

daveyb
respondido há 10 meses

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas