Why don't we have more documentation about file structure for Comprehend custom document classification training data format?

0

Someone should add in some notes on the documentation stating that there you should be using relative file paths and using the document prefix, the fact that you can literally put the bucket as the prefix and put folders and file names would be really helpful!

Additionally, add documentation about how you shouldn't have duplicates, and why it weighs down the model. The number shown in the examples is page numbers and you can reference a single PDF many times if you use these page numbers. I cannot find any of this information.

Hoping this post attracts some eyes to elicit change!

Wenta
asked 4 months ago433 views
1 Answer
1

Perhaps AWS Comprehend's documentation has changed after my last knowledge update in January 2022. But in the past, there may have been a number of reasons why there wasn't thorough documentation available regarding the file structure of the Comprehend custom document classification training data:

Custom document classifiers can be trained on a variety of document types and formats using Understand's flexibility and customization features. Giving file structure documentation that is universally applicable may be difficult due to the flexibility.

Service Evolution:

Continuous updates and enhancements are made to AWS services. At some point during the service's development, perhaps when improving core functionality took precedence over detailed documentation, this may have been the case.

Utilization-Specific Use Cases:

Different industries and use cases may require quite different approaches to document classification. Users could possess differing

answered 4 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions