Textract Request has unsupported document format

0

Hi,

I'm trying to run textract API on a document but it's returning an error: botocore.errorfactory.UnsupportedDocumentException: An error occurred (UnsupportedDocumentException) when calling the AnalyzeDocument operation: Request has unsupported document format

The thing is the document extraction works fine when uploaded to the Textract Console. Also, other documents that my code is looping through work fine as well.

I've checked: The format. It's a .pdf and it worked on the condole. The size. It's only 217KB Pages. Only 7 Open. Nope I dont have the file open somewhere else. It does not appear to be corrupted or encrypted. Regions. I've checked my bucket and textract regions, they're the same and it works on other documents.

I'd say this issue effects about 60% of the documents I'm trying to read.

Any help would be appreciated.

Thank you.

2回答
1

Hi Gregory, if you are making a call on the sync api AnalyzeDocument, then there is a restriction of pdf pages allowed. Currently Textract sync operation only supports one page pdf Ref here. For multi page pdfs please use the async apis provided. If you are facing error on async APIs only, then please reach out to AWS support and provide jobId details to understand the failure reasons.

AWS
回答済み 1年前
0
  • I'm not sure I understand what that is exactly. Are there some sample pdf's in there that you'd like me to try?

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ