- Newest
- Most votes
- Most comments
I believe the short answer to your question is no: It's not possible today to specify Textract AnalyzeExpense should only consider the summary fields to save time. There are no parameters in either the synchronous AnalyzeExpense or asynchronous StartExpenseAnalysis to achieve this.
If you're using the async APIs, do make sure you're using the event-driven SNS call-back rather than polling the job status (which introduces poll wait delay).
If your workload is small/low-concurrency (no or minimal parallel requests), you coooooould explore splitting input documents yourself and using the synchronous AnalyzeExpense API: But this approach wouldn't scale well due to the Textract Quota Limits - and I wouldn't guarantee without testing whether it'd be faster than async anyway.
Like the other answer mentioned, you could explore trying to optimize your documents themselves for faster processing?
To extract just the summary fields from a startExpenseAnalysis call with Textract, you can follow these steps:
-
Use the
StartExpenseAnalysis
API to initiate the asynchronous analysis of the invoices or receipts stored in an Amazon S3 bucket. This API will return aJobId
that you can use to retrieve the results later. -
Use the
GetExpenseAnalysis
API to retrieve the results of the expense analysis operation. The response will contain aSummaryFields
section that includes the extracted summary information, such as the total amount, currency, and other high-level details. -
To reduce the response time to less than 30 seconds, you can consider the following approaches:
-
Optimize the input documents: Ensure that the invoices or receipts are in a format (JPEG, PNG, or PDF) that Textract can process efficiently. Also, make sure the documents are of good quality and not too large.
-
Use asynchronous processing: As mentioned, the
StartExpenseAnalysis
API initiates an asynchronous operation. This allows you to retrieve the results later using theGetExpenseAnalysis
API, which should be faster than a synchronous operation. -
Implement caching: If you are processing the same documents repeatedly, you can cache the summary results to avoid re-processing the documents every time.
-
Optimize your API Gateway endpoint: Ensure that your API Gateway endpoint is configured correctly, with appropriate caching, throttling, and other performance-related settings.
-
-
Note that the
SummaryFields
section of theGetExpenseAnalysis
response will only contain the high-level summary information, and not the line item details. If you need to extract the line item information as well, you can access theLineItemGroups
section of the response.
StartExpenseAnalysis - Amazon Textract
Relevant content
- asked a year ago
- Accepted Answerasked 8 months ago
- asked a year ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated a year ago
Thanks for the answer,
Basically I'm doing that so far, probably I didn't ask properly but I'd like to configure the StartExpenseAnalysisRequest or GetExpenseAnalysisRequest in some way that the response back only includes the SummaryFields.
I haven't tried this but I think you can use FeatureTypes
Yeah but those FeatureTypes only work with DocumentAnalysis and I'm working with ExpenseAnalysis, thanks for the response.