- Newest
- Most votes
- Most comments
In general when we set 'train_max_wait' arguments in your job to less than 'train_max_run' we get this exception. As train_max_wait can be set only if train_use_spot_instances is True and must be greater than or equal to train_max_run.
I have reproduced the error on my end with the above arguments. For ex.
========= train_use_spot_instances = True train_max_run=3700 train_max_wait = 3700 if train_use_spot_instances else None. // this one worked successfully
========= train_use_spot_instances = True train_max_run=3700 train_max_wait = 3600 if train_use_spot_instances else None // this one failed with above error i.e.( ClientError: An error occurred (ValidationException) when calling the CreateTrainingJob operation: Invalid MaxWaitTimeInSeconds. It must be present and be greater than or equal to MaxRuntimeInSeconds)
============
Hence to mitigate this error please confirm the arguments train_max_wait must be greater than or equal to train_max_run from your job config, for more information please refer sample notebook: https://github.com/aws/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/managed_spot_training_object_detection/managed_spot_training_object_detection.ipynb
refer docs : https://aws.amazon.com/getting-started/hands-on/managed-spot-training-sagemaker/
Relevant content
- Accepted Answer
- Accepted Answerasked 4 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 5 months ago
- AWS OFFICIALUpdated 2 years ago
I tried to follow them and use your same values but the same error appears. I tried to update Sagemaker as first cell in the notebook, but it seems to not have any effect. I work with sagemaker.version = 2.168.0