How to create a custom label dataset by feeding manifest programmatically

0

Hello,

From https://docs.aws.amazon.com/rekognition/latest/customlabels-dg/cd-create-dataset.html
I see that I can create a manifest without using SageMaker, as long as it conforms to the format specified here: https://docs.aws.amazon.com/rekognition/latest/customlabels-dg/cd-required-fields.html

But that Custom Labels Guide only shows that I can supply/specify my manifest by clicking on "Import image Labeled by SageMaker Ground Truth"

Is there a way to create or modify dataset and supply my manifest programmatically?

Thanks.

Edited by: mymingle on Mar 2, 2020 5:48 PM

asked 4 years ago408 views
7 Answers
0
Accepted Answer

Hey mymingle

The console is just one way to kick off your training. It provides an end to end experience to upload images, annotate them as required, train and view the results.

You can also create the manifest file by any other means. As long as it meets the format requirements the training will be able to consume it. You can find the format here - https://docs.aws.amazon.com/rekognition/latest/customlabels-dg/cd-manifest-files.html

Please note the difference in the format for classification (no bounding boxes) and object detection (one or more bounding boxes per image). From your previous posts it looks like you have bounding boxes. If that is the case, refer to the section "Required Fields for Object Detection" in the above link.

As you build your file, please ensure it also meets the validation rules specified here - https://docs.aws.amazon.com/rekognition/latest/customlabels-dg/cd-manifest-files-validation-rules.html

Once you have a manifest file ready, you can kick the training via CLI by:

  1. First creating a project
    https://docs.aws.amazon.com/rekognition/latest/dg/API_CreateProject.html
  2. Creating a project version under this project
    https://docs.aws.amazon.com/rekognition/latest/dg/API_CreateProjectVersion.html
    This API kicks of the actual training process.

You have correctly pointed to the doc link for doing the same thing via the SDK.

AWS
answered 4 years ago
0

Hello,

Of course there is a way! ;)

I tried to do that 1 or 2 months ago and I've just written a blog post on this with OpenImages dataset as example.

You can check it here https://www.legiasquad.com/articles/retraining-aws-rekognition-with-custom-labels-on-an-annotated-dataset/

There is also a code sample that I used to generate this :)

I hope this helps!

answered 4 years ago
0

Hi, I have a manifest built from my dataset. The ask is to be able to run a script so that bboxes are created from the manifest prior to training, and without needing to interact with any GUI, e.g. clicking on the radio button "Import images labeled by SageMaker Ground Truth". Thanks all the same.

answered 4 years ago
0

Hello

If you have a manifest (with or without bounding boxes) in the prescribed format, you can kick off a training without the console by using the CLI via the create-project-version command:
https://docs.aws.amazon.com/cli/latest/reference/rekognition/create-project-version.html

You can also use the SDK to programmatically kick off a training using the CreateProjectVersion API: https://docs.aws.amazon.com/rekognition/latest/dg/API_CreateProjectVersion.html

Could you please elaborate on "script so that bboxes are created from the manifest prior to training"?

AWS
answered 4 years ago
0

For now, I need to interact with any GUI, e.g. clicking on the radio button "Import images labeled by SageMaker Ground Truth" and specify the manifest link in bucket. Once I submit that through the GUI, I can see that bounding boxes are drawn on the image in the console based on the bbox coordinates in the manifest. Then I click on Train button, the training will start.

I want to avoid these manual steps using the GUI. If I can pass the manifest file and kick off the training using that , I will return here and mark this question as answered.

It seems it does, from reading this example: https://docs.aws.amazon.com/rekognition/latest/customlabels-dg/tm-sdk.html

thanks.

answered 4 years ago
0

Following the SDK, using CreateProjectVersion, I can kick off training and implicitly creating the datasets by supplying two manifests, one for training and another for testing.

Also want to credit sdoloris. His websites provides a tutorial for creating the manifest that is helpful. Marking my question answered. Thank you.

Edited by: mymingle on Mar 17, 2020 4:03 PM

answered 4 years ago
0

Great - please feel free to reach out if you have any other questions !

AWS
answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions