Unfortunately, you need to use UTF-8-formatted text files in Asynchronous batch.
Documents must be in UTF-8-formatted text files.
Alternatively, you can use Amazon Textract to extract Text from the PDF and send the data to Amazon Comprehend as follows:
Unable to extract fields from Analyze Expense Demo for pdfs. Am I missing anything?asked 5 months ago
A2I for Named Entity Recognition in PDFsasked 12 days ago
Want Real-Time Custom Entity Recognition for PDFs Amazon Comprehend
Can we get sub categories in standard entity recognition of AWS Comprehend?asked 5 months ago
Custom Entity Recognition Job Not Working with .txt files
Real Time PDF Entity Recognition Not Possible?
How to properly create a custom entity recognizer in Amazon Comprehendasked a month ago
Does the custom entity recognition of Amazon Comprehend does not works with sem-structured data in Spanish?asked 3 months ago
Custom entity recognition supports a maximum of 120000 samples.asked 4 years ago
Error in Custom Entity Recognition (AWS Comprehend)asked 2 months ago