Downloading Sagemaker training image takes 1 hour

0

For the last few days, my training jobs have blown out, and the logs are showing over 1 hour to download the training job. I'm using spot instances for training - is this a symptom of that? It seems unlikely because I'd assumed if a spot instance wasn't available I'd get some other error, or at least it wouldn't have started preparing the instances? I'm using the HuggingFace estimator with the following

       transformers_version="4.28",  # Transformers version
        pytorch_version="2.0",  # PyTorch version
        py_version="py310",  # Python version
16:10:41  2023-10-10 05:10:10 Starting - Starting the training job...
16:11:41  2023-10-10 05:10:29 Starting - Preparing the instances for training......
16:12:11  2023-10-10 05:11:26 Downloading - Downloading input data...
16:19:14  2023-10-10 05:11:47 Training - Downloading the training image..........................................
17:27:40  2023-10-10 05:18:44 Training - Training image download completed. Training in progress.........................................................................................................................................................................................................................................................................................................................................................................................................................
17:28:42  2023-10-10 06:27:30 Uploading - Uploading generated training model......
17:28:42  2023-10-10 06:28:16 Completed - Training job completed
Dave
gefragt vor 7 Monaten305 Aufrufe
1 Antwort
0

Actually, now I look closer I think the logs are "wrong" - the total time spent "training" was only 1 minute, yet it normally takes 1 hour to train with checkpoints or minimum 10 minutes if I kick off the same job using checkpoints. So perhaps the log isn't being flushed correctly or something.

Dave
beantwortet vor 7 Monaten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen