Custom train videos in Rekognition


I am doing Content moderation of videos and some category of videos do not return accurate labels. So I tried using Custom Labels in Rekognition. I did not see any way to do that, it was just available for images. Is that right? Is there any other way to custom train videos?
Thank you

asked a year ago114 views
3 Answers

Hey Rashmi

I have passed on your feature feedback to the team.

I would recommend to directly extract images from the video file/stream (say using ffmpeg, opencv libraries, see link below) rather than use screenshots.

Frame extraction rate would depend on your use case, it would be hard to recommend a specific number.

For creating the train and test set, you can include an annotated mix of frames that have objects of interest and also those that do not contain them. Once you have trained the model, for inference you can sample at a constant rate of X frames (say between 1 and 5 FPS) and run inference on each to see if that catches your objects of interest.

You can find an inference on video example here -

answered a year ago
  • Hi, Could you please point me to the resource on how to do the custom training for Content Moderation? The documentation says Rekognition Custom labels console cannot be used for Content moderation. So which api can be used for this? Additionally, when labelling the images for this, do I label it "image-level"? Then when I label, should I assign the "Second-level Category"?

    Thank you, Rashmi


Hi rashub,

Rekognition Custom Labels currently does not support training or inference on videos. Currently the only way to use Rekognition Custom Labels to do inference on a video would be if you were able to extract video frames as images separately and then send them to Rekognition Custom Labels for inference. There is no support for training a custom model based on a video with Rekognition Custom Labels.

Best regards,

answered a year ago

Hi Christian,
Okay thanks!
Is there a possibility of adding this feature in near future?
I have one more question based on your response of extracting frames and trying inference on images. I want to get accurate results for black and white videos. Currently, no moderation labels are being returned for most black and white videos(good resolution ones). If I take screenshots and extract different frames, roughly how many video's frames should I look at extracting and passing to the custom training?


answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions