Textract - How to extract just certain fields

0

Hello, Im starting with Textract.

I just need to extract certain fields of invoices, but I see no option to do it, it just extract everything. How can I say to Textract which fields I need?

Thank you

질문됨 2년 전1274회 조회
5개 답변
0

Hi! Textract by default extracts all text (DetectDocumentText API) or relationship between text such as forms and tables (AnalyzeDocument API). From your post, sounds like AnalyzeDocument would suit your use case. You can then post-process Textract response to extract the required information you need. Hope this helps!

AWS
Durga_S
답변함 2년 전
0

Hello, for your usecase if you are specifically looking at extracting textract related to financial documents like Invoices and Receipts, AnalyzeExpense API might be the right API for you. Hope this is helpful!

답변함 2년 전
0

Thank you for your response. I'm trying to use API, but I receive "NoRegionError: You must specify a region." I have set it in default file, as it is in my s3 bucket. What could be the problem?

mytestingbuckettry EU (Paris) eu-west-3

[default] region = eu-west-3 output =

Thanks

답변함 2년 전
0

In short, you cannot. What you can do is use AnalyzeDocument API which will give you LINES, WORDS, TABLES, and FORMS in the entire document. You can take the response JSON and use a tool like amazon-textract-textractor to parse the response and get the desired output out of it.

That being said, since you are looking for only specific pieces of information from your document, you might want to try out the new Textract Queries feature. Using Textract queries, you can pass in questions in natural english language and let Textract find the values out of the document. Queries is supported via the AnalyzeDocument API by passing in QueriesConfig as shown below-

"QueriesConfig": { 
      "Queries": [ 
         { 
            "Alias": "string",
            "Pages": [ "string" ],
            "Text": "string"
         }
      ]
   }
profile pictureAWS
전문가
Anjan
답변함 2년 전
0

Hi, there are a couple of options to add over the ones already commented on. Async options like start_document_analysis with get_document_analysis, where you can leverage the queries feature and make the question using hierarchical questions related to the invoice or receipt text context. If you have more questions, let me know.

Ignacio
답변함 6달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠