Amazon Textract misses digits before and after commas and periods

1

Hello,

I am using Amazon Textract to transcribe tables into csv output. While it usually does a very good job, it has problems with dropping leading digits before a comma and following digits after a decimal point. I have attached an image that shows the problem while using the demo version. Has anyone encountered this problem or know of a way to fix it? As you can see, Textract does not always fail to capture the full number, and it seems to me to be correlated with the punctuation. My problem seems similar to one posted many years ago (https://repost.aws/questions/QUJgifajQpQYesjrkIocR9lw/numbers-amount-reading-problem), but the solution seems to have been relayed in a private message. Any help would be much appreciated, or questions for further clarification. Thank you!

Textract Screenshot

  • Can you share the original document or at least parts of it? Then I could run some tests to validate.

  • Thank you for using Textract. To better assist you, we will need to try this out at our end with the original image and gather a few details. It would be helpful if you can share the original image or you can also create a support ticket, and we will have our support engineer look into this for you.

  • Hello, thank you for both responses. I don't think the file hosted here [https://drive.google.com/file/d/1yGlYK6BI5popn9uwQ971AnUe4vLcCRjL/view?usp=sharing] is the exact same one as I used above, but it exhibits the same problematic behavior. I very much appreciate your attention to this matter.

Jackson
asked 2 years ago126 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions