1 Risposta
- Più recenti
- Maggior numero di voti
- Maggior numero di commenti
1
Hi.
Unfortunately, you need to use UTF-8-formatted text files in Asynchronous batch.
Documents must be in UTF-8-formatted text files.
https://docs.aws.amazon.com/comprehend/latest/dg/concepts-processing-modes.html#how-async
Alternatively, you can use Amazon Textract to extract Text from the PDF and send the data to Amazon Comprehend as follows:
Hi, thanks for your quick response! I'm just wondering if it is okay that I trained my model using PDF annotations rather than extracting the text first and then annotating that? Would it still be accurate for the extracted text from the pdfs?
As you say, it's a good idea to thoroughly verify the accuracy of Amazon Textract first.