Hi,
I am using AnalyzeExpense for extracting invoice data and I am using the following code:
***TextractClient textractclient = TextractClient.builder().region(Region.EU_WEST_1).build();
AnalyzeExpenseRequest request = AnalyzeExpenseRequest.builder()
.document( Document.builder().s3Object(S3Object.builder().name("my_file.pdf")
.bucket("textract-console-eu-west-1-28f4ed17-0a66-4382-88ca-30b327cea233").build()).build())
.build();
AnalyzeExpenseResponse result = textractclient.analyzeExpense(request);
System.out.println("Result: " + result.toString());***
The result is show as:
Result: AnalyzeExpenseResponse(DocumentMetadata=DocumentMetadata(Pages=1), ExpenseDocuments=[ExpenseDocument(ExpenseIndex=1, SummaryFields=[ExpenseField(Type=ExpenseType(Text=VENDOR_NAME, Confidence=99.30162), ValueDetection=ExpenseDetection(Text=PDSVISION, Geometry=Geometry(BoundingBox=BoundingBox(Width=0.2754717, Height=0.024333334, Left=0.03254717, Top=0.016), Polygon=[Point(X=0.03254717, Y=0.016), Point(X=0.30801886, Y=0.016), Point(X=0.30801886, Y=0.040333334), Point(X=0.03254717, Y=0.040333334)]), Confidence=99.0582), PageNumber=1), ExpenseField(Type=ExpenseType(Text=OTHER, Confidence=95.5), LabelDetection=ExpenseDetection(Text=Our Order No:, Geometry=Geometry(BoundingBox=BoundingBox(Width=0.10712053, Height=0.010183533, Left=0.28589082, Top=0.29408652), Polygon=[Point(X=0.28589082, Y=0.29408652), Point(X=0.39301136, Y=0.29408652), Point(X=0.39301136, Y=0.30427003), Point(X=0.28589082, Y=0.30427003)]), Confidence=95.44447), ValueDetection=ExpenseDetection(Text=SO2631, Geometry=Geometry(BoundingBox=BoundingBox(Width=0.0567838, Height=0.00983429, Left=0.28696728, Top=0.3110361),........
So, the result is not in JSON format.
How can I extract the fields from this output? like how can I extract INVOICE_NUMBER, INVOICE_DATE,..... in java?
Thanks & Regards,
Sameh
Thank you. This is very helpful and I am able to extract the data.
But I can see one of the fields (Invoice Total) is picked from wrong place, how can I learn this invoice and tell the AWS from where it can pick the correct Total value? so that if this invoice sent again, AWS will be able to pick the correct value.
As I know is AWS Textract has self learning and use AI to extract the data from the invoice, so is there anyway I can tell the AWS the correct location to pock the value?
Hi, unfortunately we don't support custom config or training for AnalyzeExpense. We will keep improving the model quality to provide more accurate detections.