- Le plus récent
- Le plus de votes
- La plupart des commentaires
In my experience with Custom Labels, I have found that having fewer images, with bounding boxes that highlight the full object, without conflicting objects, to provide the best performance. Because if you think about CNNs, they are feature recognition "engines" and Rekognition is very good at learning those features on the "complete object" (during training) and then when inferencing, it recognizes the features that are only present in an image that is not perfect. That has been my experience.
To answer the question of "fewer images":
- When I started I had 2k+ images, and I thought that providing all sorts of images in various forms, and lighting, would allow the model to have superior performance.
- But, what I realized is that two different objects could have some similar "features", AND some differentiating "features"
- You have to understand that the way a CNN works, (which is what Rek CL is), is they scan images to identify features, which can be all sorts of things, including edges, etc. (I'm being brief here to avoid a discourse on how CNNs work)
- So after initial training, and being frustrated with the performance, as I thought about why this State of the Art (SOTA) model wasn't performing.
- What I realized is that it was learning features of partial images, and so when it saw a partial image that HAD some of the common features, it would make a classification decision that was wrong.
- So, I then went to my training set and started removing all images that did NOT have a clear image of ALL the features I wanted it to learn, but especially the differentiating features.
- That exercise took my training set down to approx 1k images, and my model quality improved SIGNIFICANTLY (recall, precision, F1). For my use case, which is an unbalanced use case, I like F1, as it is the harmonic mean.
- Going forward I think I can still improve by spending the time to reduce my training set even further.
Thanks for your detailed answer! I am still testing with a much smaller number of images (due to the time consuming task of setting the bounding box) but I will try to ask my colleagues to chip in and test with 500-1k images :)
For items that consistently appear in all images like the plastic luggage seals you mentioned, bounding boxes are not strictly necessary during training since the model will learn to identify those items regardless of their location in the image.
However, including bounding boxes could still provide some benefits. For example, it helps the model learn the typical shape and size of the item.
When deciding whether to crop images or use the full context, either approach has merits. Cropping focuses solely on the item of interest, while full context may resemble real inference conditions more closely. You could experiment with both and see which performs better for your use case.
During inference, the model will identify items regardless of location. So there is no need for precise bounding boxes at that stage.
I'd recommend starting with full images including typical context during initial training. Then you could optionally experiment with cropping to focus on just the item of interest for later iterations.
Contenus pertinents
- demandé il y a un an
- demandé il y a 2 mois
- demandé il y a 7 mois
- demandé il y a un an
- AWS OFFICIELA mis à jour il y a 2 ans
- AWS OFFICIELA mis à jour il y a 4 mois
Hi Nathan, thanks for your answer.
What is in your opinion/experience a good number of 'fewer images' ?