내용으로 건너뛰기

AWS Textract not accepting remote/non-aws document URL ??

0

I am creating a component that extracts documents for its content. Since there are multipage pdf documents, as per my understanding I need to use ‘StartDocumentTextDetectionasync method. This method requires the document to be part of a S3 bucket in AWS, Is this assumption correct? All my documents are in an external , non-aws location, which is basically a DELL EMC ObjectStorage. The document can be accessed via http as well.

Can we pass byte array of an external document to the AWS Textract Async operations or a Url? I can see bytes are accepted for Synchronous Textract operations, but not for Async. Please let me know.

질문됨 일 년 전143회 조회
1개 답변
1

Hello,

For asynchronous operations like StartDocumentTextDetection, Amazon Textract requires the input document to be stored in an Amazon S3 bucket AND the asynchronous API does not support passing byte arrays or URLs directly as input. With that in mind, you have the following options:

  • Upload the document to an S3 bucket before processing (the simplest, if possible)
  • If your documents are relatively small (under 5 MB) and you don't need the scalability of asynchronous processing, you can use the synchronous API (DetectDocumentText). The synchronous API accepts byte arrays, allowing you to process documents without storing them in S3

Hope that helps.

Cheers

AWS
답변함 일 년 전
AWS
전문가
검토됨 일 년 전
  • Does the synchronous API (DetectDocumentText) support multipage document ?

  • Synchronous APIs support single-page documents only, as mentioned here: https://docs.aws.amazon.com/textract/latest/dg/sync.html You could also configure an S3 notification (via Lambda) so your document automatically starts processing as soon as your client uploads it to S3... And/or set up a more complex workflow orchestrated by something like AWS Step Functions, that could take other steps including potentially deleting the document from S3 once the processing is done.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

관련 콘텐츠