SageMaker Training Job Error - "Checkpoint hyperparameters are missing. Please check the checkpoint hyperparameters file exists on S3., exit code: 2"

0

Hi,

I am using SageMaker for a computer vision project. The project goal is to train an Object Detection model on SageMaker and create an Endpoint. We follow the AWS instructions to prepare a dataset having images files and *.manifest file created inside a new S3 bucket within the same region of the SageMaker notebook

We use the notebook (http://aws-tc-largeobjects.s3-us-west-2.amazonaws.com/DIG-TF-200-MLBEES-10-EN/demo.ipynb) which we download from a link provided by an AWS Youtube video (https://www.youtube.com/watch?v=OFlu6Gd7CrQ).

We followed the instructions to load the images and *.manifest file provided by the notebook ran the code and then created a Training job but failed many times with the following error:

"Failure reason ClientError: Cannot resume training. Checkpoint hyperparameters are missing. Please check the checkpoint hyperparameters file exists on S3., exit code: 2"

instance type used is p2.xlarge

I have no idea what this error means, and I have no idea what is a checkpoint hyperparameters file. I checked my S3 a hyperparameters file does not exist.

I checked and all hyperparameters are set correctly during job creation and here is the list report in the report:

Hyperparameters Key Value base_network resnet-50 early_stopping false early_stopping_min_epochs 10 early_stopping_patience 5 early_stopping_tolerance 0.0 epochs 30 freeze_layer_pattern false image_shape 300 label_width 350 learning_rate 0.001 lr_scheduler_factor 0.1 mini_batch_size 1 momentum 0.9 nms_threshold 0.45 num_classes 1 num_training_samples 400 optimizer adam overlap_threshold 0.5 use_pretrained_model 1 weight_decay 0.0005

Thanks for help!

No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions