- Newest
- Most votes
- Most comments
-
I understand that you would like to know if you are able to get more logs from textract. Unfortunately there is limitations with textract logs. What you are currently seeing is all the logging are currently supported. You could also see more info by checking the cloudtrail api calls, you could do this manually by checking the cloudtrail console, or set up logging with cloudwatch to view your cloudtrail logs[1] Usually that error happens when the document does not follow the criteria listed here [2]. Or it could be in cases where the doc is corrupted or encoded incorrectly.
-
For PDF's with pages greater than 3000, I recommended splitting your PDF into batches so that they fall within the acceptable ranges of pages. I have also provided an external link for a PDF splitter code you can implement [3]. For extra information, for images above 10 MB I recommended that you decrease the resolution of the images until they meet the 10 MB mark. I can recommend OpenCV to achieve this.
Resources: [1] https://docs.aws.amazon.com/textract/latest/dg/logging-using-cloudtrail.html
[2] https://docs.aws.amazon.com/textract/latest/dg/API_Document.html
[3] https://github.com/x4nth055/pythoncode-tutorials/tree/master/handling-pdf-files/split-pdf
Relevant content
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
Thanks for the note! Specifically, I was wanting to try see if there was a way to get more granularity on when. document fails which of the criteria it failed on. For my documents I found that it was their overall physical size not the data size, but the error did not offer that specification. Appreciate your response.