1 Antwort
- Neueste
- Die meisten Stimmen
- Die meisten Kommentare
1
Hi.
Unfortunately, you need to use UTF-8-formatted text files in Asynchronous batch.
Documents must be in UTF-8-formatted text files.
https://docs.aws.amazon.com/comprehend/latest/dg/concepts-processing-modes.html#how-async
Alternatively, you can use Amazon Textract to extract Text from the PDF and send the data to Amazon Comprehend as follows:
Relevanter Inhalt
- AWS OFFICIALAktualisiert vor einem Jahr
- AWS OFFICIALAktualisiert vor 2 Jahren
- AWS OFFICIALAktualisiert vor 7 Monaten
Hi, thanks for your quick response! I'm just wondering if it is okay that I trained my model using PDF annotations rather than extracting the text first and then annotating that? Would it still be accurate for the extracted text from the pdfs?
As you say, it's a good idea to thoroughly verify the accuracy of Amazon Textract first.