Why don't we have more documentation about file structure for Comprehend custom document classification training data format?

0

Someone should add in some notes on the documentation stating that there you should be using relative file paths and using the document prefix, the fact that you can literally put the bucket as the prefix and put folders and file names would be really helpful!

Additionally, add documentation about how you shouldn't have duplicates, and why it weighs down the model. The number shown in the examples is page numbers and you can reference a single PDF many times if you use these page numbers. I cannot find any of this information.

Hoping this post attracts some eyes to elicit change!

Wenta
已提問 4 個月前檢視次數 443 次
1 個回答
1

Perhaps AWS Comprehend's documentation has changed after my last knowledge update in January 2022. But in the past, there may have been a number of reasons why there wasn't thorough documentation available regarding the file structure of the Comprehend custom document classification training data:

Custom document classifiers can be trained on a variety of document types and formats using Understand's flexibility and customization features. Giving file structure documentation that is universally applicable may be difficult due to the flexibility.

Service Evolution:

Continuous updates and enhancements are made to AWS services. At some point during the service's development, perhaps when improving core functionality took precedence over detailed documentation, this may have been the case.

Utilization-Specific Use Cases:

Different industries and use cases may require quite different approaches to document classification. Users could possess differing

已回答 4 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南