Why don't we have more documentation about file structure for Comprehend custom document classification training data format?

0

Someone should add in some notes on the documentation stating that there you should be using relative file paths and using the document prefix, the fact that you can literally put the bucket as the prefix and put folders and file names would be really helpful!

Additionally, add documentation about how you shouldn't have duplicates, and why it weighs down the model. The number shown in the examples is page numbers and you can reference a single PDF many times if you use these page numbers. I cannot find any of this information.

Hoping this post attracts some eyes to elicit change!

Wenta
gefragt vor 4 Monaten443 Aufrufe
1 Antwort
1

Perhaps AWS Comprehend's documentation has changed after my last knowledge update in January 2022. But in the past, there may have been a number of reasons why there wasn't thorough documentation available regarding the file structure of the Comprehend custom document classification training data:

Custom document classifiers can be trained on a variety of document types and formats using Understand's flexibility and customization features. Giving file structure documentation that is universally applicable may be difficult due to the flexibility.

Service Evolution:

Continuous updates and enhancements are made to AWS services. At some point during the service's development, perhaps when improving core functionality took precedence over detailed documentation, this may have been the case.

Utilization-Specific Use Cases:

Different industries and use cases may require quite different approaches to document classification. Users could possess differing

beantwortet vor 4 Monaten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen