- Newest
- Most votes
- Most comments
We don't loose so much time on Step 2 since it's done in another microservice. Provisioned concurrency could be useful solution.
No, we don't use async Textract call. We are using start_document_text_detection and then start_document_analysis with different NotificationChannel arguments
If there are only a few documents than you probably are affected by Lambda cold starts. Depending on the used Language for Lambda that adds more ore less Latency to the pipeline.
If that is the case you can use provisioned concurrency (https://docs.aws.amazon.com/lambda/latest/dg/provisioned-concurrency.html) for the functions. That adds additional costs to the pipeline.
You can also eliminate the Lambda in Step 2 by Using the S3 Eventbridge integration and the define a Rule to put the S3 Event to SQS.
It sounds like you are using async Textract call. This indeed will take some time to process and as far as I know processing times can vary and are not guaranteed. If you know and have pages in advance already extracted, you can call synchronous methods, which should be much faster.
Would sending the base64 encoded image bytes directly through to Textract be of any help?
Visually speaking:
base64 -> API Gateway -> Lambda -> Textract
Relevant content
- asked 2 years ago
- asked 5 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 19 days ago
- AWS OFFICIALUpdated 4 months ago
the ‘start_document_text_detection’ is indeed a async api from Textract. You can find more about it here. https://docs.aws.amazon.com/textract/latest/dg/API_StartDocumentTextDetection.html