Amazon Textract misses digits before and after commas and periods

1

Hello,

I am using Amazon Textract to transcribe tables into csv output. While it usually does a very good job, it has problems with dropping leading digits before a comma and following digits after a decimal point. I have attached an image that shows the problem while using the demo version. Has anyone encountered this problem or know of a way to fix it? As you can see, Textract does not always fail to capture the full number, and it seems to me to be correlated with the punctuation. My problem seems similar to one posted many years ago (https://repost.aws/questions/QUJgifajQpQYesjrkIocR9lw/numbers-amount-reading-problem), but the solution seems to have been relayed in a private message. Any help would be much appreciated, or questions for further clarification. Thank you!

Textract Screenshot

  • Can you share the original document or at least parts of it? Then I could run some tests to validate.

  • Thank you for using Textract. To better assist you, we will need to try this out at our end with the original image and gather a few details. It would be helpful if you can share the original image or you can also create a support ticket, and we will have our support engineer look into this for you.

  • Hello, thank you for both responses. I don't think the file hosted here [https://drive.google.com/file/d/1yGlYK6BI5popn9uwQ971AnUe4vLcCRjL/view?usp=sharing] is the exact same one as I used above, but it exhibits the same problematic behavior. I very much appreciate your attention to this matter.

Jackson
已提问 2 年前129 查看次数
没有答案

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则