Please find my comments for Answer 1: Modify the Lambda function that triggers on file upload to handle multiple files. This could involve iterating over a batch of files uploaded to S3 and implement an SQS queue to hold messages that contain references to the files that need processing as Asynchronous process . Finally you need to change aws lamdba in asynch way manner.
Answer 2:Ensure that your post-processing logic can identify and handle duplicate entries that might appear across page.While merging multiple pages into a single-page PDF, as you've mentioned, can be a solution. However, as you're concerned, this can lead to large file sizes
hello, Please find my comments for Answer 1: 1.S3 Bucket for File Storage: Instead of triggering the Lambda function directly from the upload, store the uploaded files in an S3 bucket. This allows you to handle multiple files and provides a durable storage solution.
2.S3 Event Trigger:
Configure an S3 event trigger on the bucket so that when new files are uploaded, it triggers a Lambda function.
3. Lambda Function to Queue Jobs: Modify your Lambda function to be a queue handler instead of directly starting the analysis. When triggered by the S3 event, the Lambda function should pick up the uploaded file(s), and enqueue a job for each file in a scalable queue service like Amazon Simple Queue Service (SQS).
4. SQS for Job Queue:
Create an SQS queue where each message in the queue represents a job to be processed.
5. Asynchronous Processing Lambda: Implement a separate Lambda function that polls the SQS queue for new jobs. When a job is received, this Lambda function can then invoke the startAnalyzeExpense function for each file.
Depending on your requirements, you can handle the results differently. You might store the processed data in a database, notify users via email, or store the results back in S3.
7. Monitoring and Error Handling: Implement monitoring and error handling in your architecture. For example, CloudWatch can be used to monitor the performance of Lambda functions, and you can set up dead-letter queues in SQS for handling failed processing attempts.
Here's a simplified flow:
- User uploads files to an S3 bucket.
- S3 bucket triggers an event.
- The Lambda function (Queue Handler) is triggered by the S3 event.
- The Lambda function enqueues a message in an SQS queue for each file.
- Another Lambda function (Processing Lambda) polls the SQS queue for new jobs.
- The Processing Lambda function invokes the
startAnalyzeExpensefunction for each file.
- Processed results are stored or handled accordingly.
This approach allows you to handle multiple files asynchronously, making your system more scalable and flexible. Additionally, it provides better error handling and monitoring capabilities.
Hello, Please find my comments for Answer 2:
1.Review API Responses: Check the responses you receive from the startAnalyzeExpense API call. See if there are any identifiers or metadata that can help you link or group the results for multi-page documents.
2.Combine Results Client-Side: If you're receiving separate results for each page, you may consider combining the results on your end after receiving them. This would involve processing the individual results and merging them into a cohesive structure.
3.Custom Post-Processing: Develop a custom post-processing step where you analyze the individual results and consolidate them based on your business logic. You can use information like document identifiers or page numbers to associate and merge the results.
4.Limit the Number of Pages: If file size is a concern, and you're still considering merging PDFs, you could explore limiting the number of pages in each file before merging. This might help control the resulting file size while still ensuring that the documents are processed correctly.
5.Optimize PDFs: Before merging, you can consider optimizing the individual PDFs to reduce file size. There are tools and libraries available that can help compress and optimize PDF files without losing important information.
Thank you both of you, so it would be something like this?
- Upload a File to S3: When a file is uploaded to your S3 bucket, an event notification is automatically sent to an SQS queue.
- SQS Processes the Event: The SQS queue receives the event notification and stores it in a queue.
- Trigger a Lambda Function: A separate Lambda function is triggered by the SQS queue. This function retrieves the event details from the SQS message and processes the file accordingly.
- Process the File: The Lambda function uses the Amazon Textract service to analyze the file and performs any necessary actions.
While investigating I came across that EventBridge is another option, but for more complex scenarios.
@Jagan what do you mean by change aws lamdba in async way manner? I'm already using an async/await handler function and using startExpenseAnalysis and getExpenseAnalysis which are specific for async way instead of analyzeExpense which is synchronous.
- Accepted Answerasked 3 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- EXPERTpublished 4 months ago