AWS Rekognition Custom Labels

0

Hi all!

I am currently working with AWS Rekognition for image analysis, and I have a question regarding the use of bounding boxes and image cropping for items that appear consistently in all images. Specifically, I am dealing with plastic luggage seals https://imgur.com/a/rxUzDTn.

Do I need to use bounding boxes for items that are present in all images? These items are consistent and should be identifiable without the need for spatial information.

Also, is there any benefit to cropping the image to focus solely on the luggage seal during the training phase? Or would it be more beneficial to include the recurring item within a broader context - as it might appear during inference?

I appreciate any insights, best practices, or experiences you can share on this matter.

Thank you in advance for your assistance!

asked 3 months ago255 views
3 Answers
1
Accepted Answer

In my experience with Custom Labels, I have found that having fewer images, with bounding boxes that highlight the full object, without conflicting objects, to provide the best performance. Because if you think about CNNs, they are feature recognition "engines" and Rekognition is very good at learning those features on the "complete object" (during training) and then when inferencing, it recognizes the features that are only present in an image that is not perfect. That has been my experience.

answered 3 months ago
  • Hi Nathan, thanks for your answer.

    What is in your opinion/experience a good number of 'fewer images' ?

1

To answer the question of "fewer images":

  • When I started I had 2k+ images, and I thought that providing all sorts of images in various forms, and lighting, would allow the model to have superior performance.
  • But, what I realized is that two different objects could have some similar "features", AND some differentiating "features"
  • You have to understand that the way a CNN works, (which is what Rek CL is), is they scan images to identify features, which can be all sorts of things, including edges, etc. (I'm being brief here to avoid a discourse on how CNNs work)
  • So after initial training, and being frustrated with the performance, as I thought about why this State of the Art (SOTA) model wasn't performing.
  • What I realized is that it was learning features of partial images, and so when it saw a partial image that HAD some of the common features, it would make a classification decision that was wrong.
  • So, I then went to my training set and started removing all images that did NOT have a clear image of ALL the features I wanted it to learn, but especially the differentiating features.
  • That exercise took my training set down to approx 1k images, and my model quality improved SIGNIFICANTLY (recall, precision, F1). For my use case, which is an unbalanced use case, I like F1, as it is the harmonic mean.
  • Going forward I think I can still improve by spending the time to reduce my training set even further.
answered 3 months ago
  • Thanks for your detailed answer! I am still testing with a much smaller number of images (due to the time consuming task of setting the bounding box) but I will try to ask my colleagues to chip in and test with 500-1k images :)

1

For items that consistently appear in all images like the plastic luggage seals you mentioned, bounding boxes are not strictly necessary during training since the model will learn to identify those items regardless of their location in the image.

However, including bounding boxes could still provide some benefits. For example, it helps the model learn the typical shape and size of the item.

When deciding whether to crop images or use the full context, either approach has merits. Cropping focuses solely on the item of interest, while full context may resemble real inference conditions more closely. You could experiment with both and see which performs better for your use case.

During inference, the model will identify items regardless of location. So there is no need for precise bounding boxes at that stage.

I'd recommend starting with full images including typical context during initial training. Then you could optionally experiment with cropping to focus on just the item of interest for later iterations.

profile picture
EXPERT
answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions