Textract Request has unsupported document format

0

Hi,

I'm trying to run textract API on a document but it's returning an error: botocore.errorfactory.UnsupportedDocumentException: An error occurred (UnsupportedDocumentException) when calling the AnalyzeDocument operation: Request has unsupported document format

The thing is the document extraction works fine when uploaded to the Textract Console. Also, other documents that my code is looping through work fine as well.

I've checked: The format. It's a .pdf and it worked on the condole. The size. It's only 217KB Pages. Only 7 Open. Nope I dont have the file open somewhere else. It does not appear to be corrupted or encrypted. Regions. I've checked my bucket and textract regions, they're the same and it works on other documents.

I'd say this issue effects about 60% of the documents I'm trying to read.

Any help would be appreciated.

Thank you.

2 Answers
1

Hi Gregory, if you are making a call on the sync api AnalyzeDocument, then there is a restriction of pdf pages allowed. Currently Textract sync operation only supports one page pdf Ref here. For multi page pdfs please use the async apis provided. If you are facing error on async APIs only, then please reach out to AWS support and provide jobId details to understand the failure reasons.

AWS
answered a year ago
0
  • I'm not sure I understand what that is exactly. Are there some sample pdf's in there that you'd like me to try?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions