Textract Request has unsupported document format

0

Hi,

I'm trying to run textract API on a document but it's returning an error: botocore.errorfactory.UnsupportedDocumentException: An error occurred (UnsupportedDocumentException) when calling the AnalyzeDocument operation: Request has unsupported document format

The thing is the document extraction works fine when uploaded to the Textract Console. Also, other documents that my code is looping through work fine as well.

I've checked: The format. It's a .pdf and it worked on the condole. The size. It's only 217KB Pages. Only 7 Open. Nope I dont have the file open somewhere else. It does not appear to be corrupted or encrypted. Regions. I've checked my bucket and textract regions, they're the same and it works on other documents.

I'd say this issue effects about 60% of the documents I'm trying to read.

Any help would be appreciated.

Thank you.

2 個答案
1

Hi Gregory, if you are making a call on the sync api AnalyzeDocument, then there is a restriction of pdf pages allowed. Currently Textract sync operation only supports one page pdf Ref here. For multi page pdfs please use the async apis provided. If you are facing error on async APIs only, then please reach out to AWS support and provide jobId details to understand the failure reasons.

AWS
已回答 1 年前
0
  • I'm not sure I understand what that is exactly. Are there some sample pdf's in there that you'd like me to try?

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南