1 個回答
- 最新
- 最多得票
- 最多評論
1
Hi.
Unfortunately, you need to use UTF-8-formatted text files in Asynchronous batch.
Documents must be in UTF-8-formatted text files.
https://docs.aws.amazon.com/comprehend/latest/dg/concepts-processing-modes.html#how-async
Alternatively, you can use Amazon Textract to extract Text from the PDF and send the data to Amazon Comprehend as follows:
相關內容
- 已提問 1 年前
- 已提問 6 個月前
- AWS 官方已更新 2 年前
- AWS 官方已更新 1 年前
- AWS 官方已更新 1 年前
Hi, thanks for your quick response! I'm just wondering if it is okay that I trained my model using PDF annotations rather than extracting the text first and then annotating that? Would it still be accurate for the extracted text from the pdfs?
As you say, it's a good idea to thoroughly verify the accuracy of Amazon Textract first.