Amazon Textract misses digits before and after commas and periods

1

Hello,

I am using Amazon Textract to transcribe tables into csv output. While it usually does a very good job, it has problems with dropping leading digits before a comma and following digits after a decimal point. I have attached an image that shows the problem while using the demo version. Has anyone encountered this problem or know of a way to fix it? As you can see, Textract does not always fail to capture the full number, and it seems to me to be correlated with the punctuation. My problem seems similar to one posted many years ago (https://repost.aws/questions/QUJgifajQpQYesjrkIocR9lw/numbers-amount-reading-problem), but the solution seems to have been relayed in a private message. Any help would be much appreciated, or questions for further clarification. Thank you!

Textract Screenshot

  • Can you share the original document or at least parts of it? Then I could run some tests to validate.

  • Thank you for using Textract. To better assist you, we will need to try this out at our end with the original image and gather a few details. It would be helpful if you can share the original image or you can also create a support ticket, and we will have our support engineer look into this for you.

  • Hello, thank you for both responses. I don't think the file hosted here [https://drive.google.com/file/d/1yGlYK6BI5popn9uwQ971AnUe4vLcCRjL/view?usp=sharing] is the exact same one as I used above, but it exhibits the same problematic behavior. I very much appreciate your attention to this matter.

Jackson
質問済み 2年前129ビュー
回答なし

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ