スキップしてコンテンツを表示

AWS Textract not accepting remote/non-aws document URL ??

0

I am creating a component that extracts documents for its content. Since there are multipage pdf documents, as per my understanding I need to use ‘StartDocumentTextDetectionasync method. This method requires the document to be part of a S3 bucket in AWS, Is this assumption correct? All my documents are in an external , non-aws location, which is basically a DELL EMC ObjectStorage. The document can be accessed via http as well.

Can we pass byte array of an external document to the AWS Textract Async operations or a Url? I can see bytes are accepted for Synchronous Textract operations, but not for Async. Please let me know.

質問済み 1年前143ビュー
1回答
1

Hello,

For asynchronous operations like StartDocumentTextDetection, Amazon Textract requires the input document to be stored in an Amazon S3 bucket AND the asynchronous API does not support passing byte arrays or URLs directly as input. With that in mind, you have the following options:

  • Upload the document to an S3 bucket before processing (the simplest, if possible)
  • If your documents are relatively small (under 5 MB) and you don't need the scalability of asynchronous processing, you can use the synchronous API (DetectDocumentText). The synchronous API accepts byte arrays, allowing you to process documents without storing them in S3

Hope that helps.

Cheers

AWS
回答済み 1年前
AWS
エキスパート
レビュー済み 1年前
  • Does the synchronous API (DetectDocumentText) support multipage document ?

  • Synchronous APIs support single-page documents only, as mentioned here: https://docs.aws.amazon.com/textract/latest/dg/sync.html You could also configure an S3 notification (via Lambda) so your document automatically starts processing as soon as your client uploads it to S3... And/or set up a more complex workflow orchestrated by something like AWS Step Functions, that could take other steps including potentially deleting the document from S3 once the processing is done.

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

関連するコンテンツ