SageMaker TensorFlow Object Detection: Null annotations raises exception

Hi there, nice to meet you all,

I've been trying to train an Object Detection Model (using Built-in Algoritms, Tensorflow) following the jumpstart examples as template, but as soon as I provide a null annotation sagemaker fails to train when calling fit(), and throws the following error (at the end of the post is the entire stacktrace)

ValueError: Invalid dimensions for box data: (0,)

As I understand, according to the mentioned example the annotations.json should have a COCO structure-like in order to Sagemaker considers it has valid, quoting the same tutorial:

The annotations.json file should have information for bounding_boxes and their class labels. It should have a dictionary with keys "images" and "annotations". Value for the "images" key should be a list of entries, one for each image of the form {"file_name": image_name, "height": height, "width": width, "id": image_id}. Value of the 'annotations' key should be a list of entries, one for each bounding box of the form {"image_id": image_id, "bbox": [xmin, ymin, xmax, ymax], "category_id": bbox_label}.

That is, as far as I know the COCO format (it should be explicitly mentioned i think!)

Great!, So what the issue here? If I provide a dataset with equal n° of images and annotations, thats great and I can train sucessfully my model, but if I have an image with no object in it, then i will have more images than annotations, for instance:

0001.jpeg 1 has object and the corresponding annotation

0002.jpeg has no object, so it doest have an annotation

So the annotation.json file looks like:

{ "images": [ { "file_name": "0001.jpeg", "height": 1944, "width": 2592, "id": "0001" }, { "file_name": "0002.jpeg", "height": 1944, "width": 2592, "id": "0002" } ], "annotations": [ { "image_id": "0001", "bbox": [ 688, 371, 1859, 1581 ], "category_id": 0 } ] }

As far as i could investigate, this is the standrad proccedure when no object is available, I've also downloaded an entire annotation coco dataset 2017 to make sure of this (you can download "2017 Train/Val annotations [241MB]" and search of instances_val2017.json and look for files with ID 25593, 41488, 42888 ... and you'll see that there are the images ones, but not the annotations ones)

So, I would like gently ask for your help, so I can properly train my model!

Thanks in advance!

P.S:

TraceBack

[Epoch 0], Speed: 0.058 samples/sec, loss=431737.90625.
Traceback (most recent call last):
  File "/opt/ml/code/transfer_learning.py", line 246, in <module>
run_with_args(args)
  File "/opt/ml/code/transfer_learning.py", line 201, in run_with_args
train_and_save_model(
  File "/opt/ml/code/train.py", line 130, in train_and_save_model
validation_losses = run_validation(detection_model, validation_data, batch_size, image_size, epoch)
  File "/opt/ml/code/validation.py", line 25, in run_validation
    losses_dict = model.loss(prediction_dict, shapes)
  File "/opt/ml/code/object_detection/meta_architectures/ssd_meta_arch.py", line 824, in loss
) = self._assign_targets(
  File "/opt/ml/code/object_detection/meta_architectures/ssd_meta_arch.py", line 1013, in _assign_targets
groundtruth_boxlists = [box_list.BoxList(boxes) for boxes in groundtruth_boxes_list]
  File "/opt/ml/code/object_detection/meta_architectures/ssd_meta_arch.py", line 1013, in <listcomp>
groundtruth_boxlists = [box_list.BoxList(boxes) for boxes in groundtruth_boxes_list]
  File "/opt/ml/code/object_detection/core/box_list.py", line 55, in __init__
raise ValueError("Invalid dimensions for box data: {}".format(boxes.shape))
ValueError: Invalid dimensions for box data: (0,)
2023-03-09 21:59:57,046 sagemaker-training-toolkit INFO     Waiting for the process to finish and give a return code.
2023-03-09 21:59:57,047 sagemaker-training-toolkit INFO     Done waiting for a return code. Received 1 from exiting process.
2023-03-09 21:59:57,048 sagemaker-training-toolkit ERROR    Reporting training FAILURE
2023-03-09 21:59:57,048 sagemaker-training-toolkit ERROR    ExecuteUserScriptError:
ExitCode 1
ErrorMessage "raise ValueError("Invalid dimensions for box data: {}".format(boxes.shape))
 ValueError: Invalid dimensions for box data: (0,)"
Command "/usr/local/bin/python3.9 transfer_learning.py --batch_size 5 --beta_1 0.9 --beta_2 0.999 --early_stopping False --early_stopping_min_delta 0.0 --early_stopping_patience 5 --epochs 10 --epsilon 1e-07 --initial_accumulator_value 0.1 --learning_rate 0.001 --momentum 0.9 --optimizer adam --reinitialize_top_layer Auto --rho 0.95 --train_only_top_layer False"
2023-03-09 21:59:57,048 sagemaker-training-toolkit ERROR    Encountered exit_code 1

2023-03-09 22:00:14 Uploading - Uploading generated training model
2023-03-09 22:00:20 Failed - Training job failed
-----------------------------------------------------------------------------------------------------------------------------------------------------------
UnexpectedStatusException                 Traceback (most recent call last)
Cell In[19], line 3
      1 #with load_run(experiment_name=demo_experiment.experiment_name, run_name=demo_trial.trial_name) as run:
      2     #run.log_parameter("param1", "value1")
----> 3 od_estimator.fit(
      4         {"training": train_path},
      5         # {"training": train_path, "validation": validation_path}}, 
      6         logs=True, 
      7         job_name=training_job_name, 
      8         experiment_config = {
      9             # "ExperimentName"
     10             "TrialName" : demo_trial.trial_name,
     11             "TrialComponentDisplayName" : "TrainingJob",
     12     })

File /opt/conda/lib/python3.8/site-packages/sagemaker/workflow/pipeline_context.py:272, in runnable_by_pipeline.<locals>.wrapper(*args, **kwargs)
    268         return context
    270     return _StepArguments(retrieve_caller_name(self_instance), run_func, *args, **kwargs)
--> 272 return run_func(*args, **kwargs)

File /opt/conda/lib/python3.8/site-packages/sagemaker/estimator.py:1163, in EstimatorBase.fit(self, inputs, wait, logs, job_name, experiment_config)
   1161 self.jobs.append(self.latest_training_job)
   1162 if wait:
-> 1163     self.latest_training_job.wait(logs=logs)

File /opt/conda/lib/python3.8/site-packages/sagemaker/estimator.py:2311, in _TrainingJob.wait(self, logs)
   2309 # If logs are requested, call logs_for_jobs.
   2310 if logs != "None":
-> 2311     self.sagemaker_session.logs_for_job(self.job_name, wait=True, log_type=logs)
   2312 else:
   2313     self.sagemaker_session.wait_for_job(self.job_name)

File /opt/conda/lib/python3.8/site-packages/sagemaker/session.py:4176, in Session.logs_for_job(self, job_name, wait, poll, log_type)
   4173             last_profiler_rule_statuses = profiler_rule_statuses
   4175 if wait:
-> 4176     self._check_job_status(job_name, description, "TrainingJobStatus")
   4177     if dot:
   4178         print()

File /opt/conda/lib/python3.8/site-packages/sagemaker/session.py:3707, in Session._check_job_status(self, job, desc, status_key_name)
   3701 if "CapacityError" in str(reason):
   3702     raise exceptions.CapacityError(
   3703         message=message,
   3704         allowed_statuses=["Completed", "Stopped"],
   3705         actual_status=status,
   3706     )
-> 3707 raise exceptions.UnexpectedStatusException(
   3708     message=message,
   3709     allowed_statuses=["Completed", "Stopped"],
   3710     actual_status=status,
   3711 )

UnexpectedStatusException: Error for Training job TrainNullAnnotations-tensorflow-od1-ssd-2023-03-09-21-48-59-470: Failed. Reason: AlgorithmError: ExecuteUserScriptError:
ExitCode 1
ErrorMessage "raise ValueError("Invalid dimensions for box data: {}".format(boxes.shape))
 ValueError: Invalid dimensions for box data: (0,)"
Command "/usr/local/bin/python3.9 transfer_learning.py --batch_size 5 --beta_1 0.9 --beta_2 0.999 --early_stopping False --early_stopping_min_delta 0.0 --early_stopping_patience 5 --epochs 10 --epsilon 1e-07 --initial_accumulator_value 0.1 --learning_rate 0.001 --momentum 0.9 --optimizer adam --reinitialize_top_layer Auto --rho 0.95 --train_only_top_layer False", exit code: 1

Topics

TensorFlow on AWS Machine Learning & AI

Relevant content

Importing externally-trained TensorFlow v2 models to SageMaker deployment
EXPERT
Alex_T
asked 2 years ago
GPU not detected by tensorflow in SM Studio
haganHL
asked 3 years ago
Can't deploy sagemaker object detection model on DeepLens
matt_the_hat
asked 2 years ago
Build-in Object Detection Algorithm and Sagemaker Neo
AlexLe
asked 5 years ago
How do I manage and troubleshoot CloudWatch anomaly detection bands?
AWS OFFICIALUpdated 2 months ago
How can I use the AWS CLI to create a CloudWatch alarm based on anomaly detection?
AWS OFFICIALUpdated 8 months ago
How can I deploy an Amazon SageMaker model to a different AWS account?
AWS OFFICIALUpdated 8 months ago
How do I resolve drift detection errors in CloudFormation with my AWS managed rule "cloudformation-stack-drift-detection-check" for AWS Config?
AWS OFFICIALUpdated 3 years ago
Train large language model using Hugging Face and AWS Trainium
EXPERT
Kamran Khan
published a year ago
Leveraging Amazon SageMaker to detect impairments in digital RF signals
EXPERT
Alan Campbell
published 3 months ago