Why don't we have more documentation about file structure for Comprehend custom document classification training data format?

0

Someone should add in some notes on the documentation stating that there you should be using relative file paths and using the document prefix, the fact that you can literally put the bucket as the prefix and put folders and file names would be really helpful!

Additionally, add documentation about how you shouldn't have duplicates, and why it weighs down the model. The number shown in the examples is page numbers and you can reference a single PDF many times if you use these page numbers. I cannot find any of this information.

Hoping this post attracts some eyes to elicit change!

Wenta
preguntada hace 4 meses443 visualizaciones
1 Respuesta
1

Perhaps AWS Comprehend's documentation has changed after my last knowledge update in January 2022. But in the past, there may have been a number of reasons why there wasn't thorough documentation available regarding the file structure of the Comprehend custom document classification training data:

Custom document classifiers can be trained on a variety of document types and formats using Understand's flexibility and customization features. Giving file structure documentation that is universally applicable may be difficult due to the flexibility.

Service Evolution:

Continuous updates and enhancements are made to AWS services. At some point during the service's development, perhaps when improving core functionality took precedence over detailed documentation, this may have been the case.

Utilization-Specific Use Cases:

Different industries and use cases may require quite different approaches to document classification. Users could possess differing

respondido hace 4 meses

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas