AnalyzeExpenseResponse returns Non JSON format output in Java

0

Hi, I am using AnalyzeExpense for extracting invoice data and I am using the following code:

***TextractClient textractclient = TextractClient.builder().region(Region.EU_WEST_1).build(); AnalyzeExpenseRequest request = AnalyzeExpenseRequest.builder() .document( Document.builder().s3Object(S3Object.builder().name("my_file.pdf") .bucket("textract-console-eu-west-1-28f4ed17-0a66-4382-88ca-30b327cea233").build()).build()) .build();

AnalyzeExpenseResponse result = textractclient.analyzeExpense(request); System.out.println("Result: " + result.toString());***

The result is show as: Result: AnalyzeExpenseResponse(DocumentMetadata=DocumentMetadata(Pages=1), ExpenseDocuments=[ExpenseDocument(ExpenseIndex=1, SummaryFields=[ExpenseField(Type=ExpenseType(Text=VENDOR_NAME, Confidence=99.30162), ValueDetection=ExpenseDetection(Text=PDSVISION, Geometry=Geometry(BoundingBox=BoundingBox(Width=0.2754717, Height=0.024333334, Left=0.03254717, Top=0.016), Polygon=[Point(X=0.03254717, Y=0.016), Point(X=0.30801886, Y=0.016), Point(X=0.30801886, Y=0.040333334), Point(X=0.03254717, Y=0.040333334)]), Confidence=99.0582), PageNumber=1), ExpenseField(Type=ExpenseType(Text=OTHER, Confidence=95.5), LabelDetection=ExpenseDetection(Text=Our Order No:, Geometry=Geometry(BoundingBox=BoundingBox(Width=0.10712053, Height=0.010183533, Left=0.28589082, Top=0.29408652), Polygon=[Point(X=0.28589082, Y=0.29408652), Point(X=0.39301136, Y=0.29408652), Point(X=0.39301136, Y=0.30427003), Point(X=0.28589082, Y=0.30427003)]), Confidence=95.44447), ValueDetection=ExpenseDetection(Text=SO2631, Geometry=Geometry(BoundingBox=BoundingBox(Width=0.0567838, Height=0.00983429, Left=0.28696728, Top=0.3110361),........

So, the result is not in JSON format.

How can I extract the fields from this output? like how can I extract INVOICE_NUMBER, INVOICE_DATE,..... in java?

Thanks & Regards, Sameh

asked 2 years ago243 views
1 Answer
1

Hi Sameh, thanks for using Textract service. Please see the guidance here https://docs.aws.amazon.com/textract/latest/dg/analyzing-document-expense.html. You can use result.expenseDocuments() to get ExpenseDocuments. This will be a list of ExpenseDocument objects, you can further extract information from these objects.

AWS
answered 2 years ago
  • Thank you. This is very helpful and I am able to extract the data.

    But I can see one of the fields (Invoice Total) is picked from wrong place, how can I learn this invoice and tell the AWS from where it can pick the correct Total value? so that if this invoice sent again, AWS will be able to pick the correct value.

    As I know is AWS Textract has self learning and use AI to extract the data from the invoice, so is there anyway I can tell the AWS the correct location to pock the value?

  • Hi, unfortunately we don't support custom config or training for AnalyzeExpense. We will keep improving the model quality to provide more accurate detections.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions